Qwen3.6-35B-A3B released

TheCornCollector@piefed.zip · edit-2 11 days ago

Qwen3.6-35B-A3B released

TheCornCollector@piefed.zip · edit-2 4 days ago

I’m running it with the UD_Q4_K_XL quant on 24GB VRAM 7900XTX at ~120-130* token/s. Since it’s an MOE model, CPU inference with 32 GB ram should be doable, but I won’t make any promises on speed.

*Edit: I had a configuration issue on my llama.cpp that reduced the performance. It was limited to 85 tk/s but that was user error on my part.

ericwdhs@discuss.online · 7 days ago

Can I ask you what GPU driver version you’re running? I’m running a 7900 XTX as well and recently encountered some stability issues after a driver update (trying to support gaming and AI stuff at the same time). The latest version I could find as a recommendation for similar issues was 24.12.1.

TheCornCollector@piefed.zip · 6 days ago

Ah, I don’t know anything about Windows. I’m using Linux and both the latest ROCM (7.2.2) and latest vulkan (26.0.5) packages work without issues for combined gaming and AI. My reported numbers were with Vulkan at zero context for reference.

ericwdhs@discuss.online · 6 days ago

Thanks! I’m migrating all my PCs to Linux anyway and just haven’t gotten to the AI stuff yet, so it sounds like that might fix itself.

venusaur@lemmy.world · 11 days ago

Thanks! That sounds expensive. Hopefully 24GB VRAM gets cheaper or models get more efficient soon.

Jakeroxs@sh.itjust.works · 11 days ago

You would want to wait till smaller models for 3.6 are released, I’d assume it’ll be soon

venusaur@lemmy.world · 11 days ago

Thanks! I’m hoping to run at least 20B. Idk if I can do that fast enough without 24GB. Seems to be the sweet spot.

fonix232@fedia.io · 11 days ago

Wonder what the wombo-combo of Ryzen AI APU can do with this.

Time to fire up the trusty 370.

Qwen3.6-35B-A3B released

Qwen3.6-35B-A3B released

Qwen/Qwen3.6-35B-A3B · Hugging Face