Ollama with AMD

Earlier this year, I experimented with various LLM models using Ollama on our gaming PC with a Nvidia RTX 3070ti GPU. At the time, I had also tried with our other gaming PC running on an AMD Radeon 6700xt GPU. Unfortunately, I wasn’t successful and the models on that PC had fallen back to the CPU, resulting in running times at least 10x slower than the Nvidia GPU system.

Since then, enthusiasts online have filled in the support that AMD themselves can’t seem to deliver. Thanks to this Github project, it is now possible to run recent models on the 6700xt GPU (aka gfx1031).

Installing Ollama

On this PC, I install apps and data on the larger D: drive, so first download OllamaSetup.exe and install via Powershell:

.\OllamaSetup.exe /DIR="D:\Program Files\Ollama"

Then quit Ollama from system tray and download the correct ROCM libraries for the 6700xt which is the gfx1031 generation.

Find the rocblas.dll file and the rocblas/library folder within your Ollama installation folder (located at D:\Program Files\Ollama\lib\ollama\rocm).
Delete the existing rocblas/library folder.
Replace it with the correct ROCm libraries.
Set env var OLLAMA_MODELS=D:\Program Files\Ollama\models (create the folder)
Then run Ollama again from Start menu.

Then launch OpenWebUI in Docker:

docker run -d `
   -p 3000:8080 `
   -v open-webui:/app/backend/data `
   --name open-webui `
   --restart always `
   -e OLLAMA_BASE_URL=http://host.docker.internal:11434 `
   ghcr.io/open-webui/open-webui:main

Then browse to http://localhost:3000 to access OpenWebUI, where I tested the recent Qwen v3 model:

Monitoring the speed of the response, I was pleasantly surprised. Much faster than the CPU-fallback mode I experienced earlier. Ollama reported GPU being used:

Monitoring the GPU usage via Task Manager, I was able to run queries against both Qwen and Gemma, though that is a bit too tight:

using a Radeon 6700xt gpu

Installing Ollama

More in this series…