Earlier this year, I experimented with various LLM models using Ollama on our gaming PC with a Nvidia RTX 3070ti GPU. At the time, I had also tried with our other gaming PC running on an AMD Radeon 6700xt GPU. Unfortunately, I wasn’t successful and the models on that PC had fallen back to the CPU, resulting in running times at least 10x slower than the Nvidia GPU system.
Since then, enthusiasts online have filled in the support that AMD themselves can’t seem to deliver. Thanks to this Github project, it is now possible to run recent models on the 6700xt GPU (aka gfx1031).
On this PC, I install apps and data on the larger D: drive, so first download OllamaSetup.exe and install via Powershell:
.\OllamaSetup.exe /DIR="D:\Program Files\Ollama"
Then quit Ollama from system tray and download the correct ROCM libraries for the 6700xt which is the gfx1031 generation.
rocblas/library folder.Then launch OpenWebUI in Docker:

docker run -d `
-p 3000:8080 `
-v open-webui:/app/backend/data `
--name open-webui `
--restart always `
-e OLLAMA_BASE_URL=http://host.docker.internal:11434 `
ghcr.io/open-webui/open-webui:main
Then browse to http://localhost:3000 to access OpenWebUI, where I tested the recent Qwen v3 model:

Monitoring the speed of the response, I was pleasantly surprised. Much faster than the CPU-fallback mode I experienced earlier. Ollama reported GPU being used:

Monitoring the GPU usage via Task Manager, I was able to run queries against both Qwen and Gemma, though that is a bit too tight:
