If your GPU can run inference, it should be able to fine-tune too. [P]
I spent the last few months building a new sparse fine-tuning method for MoE models called USAF. The goal was simple: if your GPU can run inference on an MoE model, it should also be able to fine-tune it.
On my AMD RX 6750 XT (12 GB), I can fine-tune Qwen3-30B-A3B by training sparse expert weights and the router instead of adapters.
The project is completely open source under the Apache 2.0 license. I'm not trying to build a business, sell anything, or monetize it in any way-I just wanted to share something I built that I think is genuinely interesting.
I'd love to hear your feedback, especially from people working with MoE models.
GitHub: https://github.com/tsuyu122/usaf
Comments
No comments yet. Start the discussion.