The Qwen3.5 models are still the best local models I’ve used, so I’m excited to see how this updated version performs.

  • TheCornCollector@piefed.zipOP
    link
    fedilink
    English
    arrow-up
    6
    ·
    edit-2
    4 days ago

    I’m running it with the UD_Q4_K_XL quant on 24GB VRAM 7900XTX at ~120-130* token/s. Since it’s an MOE model, CPU inference with 32 GB ram should be doable, but I won’t make any promises on speed.

    *Edit: I had a configuration issue on my llama.cpp that reduced the performance. It was limited to 85 tk/s but that was user error on my part.

    • ericwdhs@discuss.online
      link
      fedilink
      English
      arrow-up
      1
      ·
      7 days ago

      Can I ask you what GPU driver version you’re running? I’m running a 7900 XTX as well and recently encountered some stability issues after a driver update (trying to support gaming and AI stuff at the same time). The latest version I could find as a recommendation for similar issues was 24.12.1.

      • TheCornCollector@piefed.zipOP
        link
        fedilink
        English
        arrow-up
        1
        ·
        6 days ago

        Ah, I don’t know anything about Windows. I’m using Linux and both the latest ROCM (7.2.2) and latest vulkan (26.0.5) packages work without issues for combined gaming and AI. My reported numbers were with Vulkan at zero context for reference.

        • ericwdhs@discuss.online
          link
          fedilink
          English
          arrow-up
          2
          ·
          6 days ago

          Thanks! I’m migrating all my PCs to Linux anyway and just haven’t gotten to the AI stuff yet, so it sounds like that might fix itself.

    • venusaur@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      ·
      11 days ago

      Thanks! That sounds expensive. Hopefully 24GB VRAM gets cheaper or models get more efficient soon.

        • venusaur@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          11 days ago

          Thanks! I’m hoping to run at least 20B. Idk if I can do that fast enough without 24GB. Seems to be the sweet spot.

    • fonix232@fedia.io
      link
      fedilink
      arrow-up
      1
      ·
      11 days ago

      Wonder what the wombo-combo of Ryzen AI APU can do with this.

      Time to fire up the trusty 370.