In the realm of AI, Qwopus has caught attention as an open-source model transforming Claude Opus 4.6’s reasoning into Alibaba’s Qwen, enabling users to run a free version akin to Opus on personal hardware. The challenge with this approach was that Qwen is inherently Chinese, which deterred some users.
Responding to user feedback, Jackrong, the developer behind Qwopus and known for his pseudonymity, introduced Gemopus. This new suite of models fine-tunes the Claude Opus style but relies entirely on Google’s open-source Gemma 4, ensuring an all-American framework that offers advanced reasoning capabilities while running locally on existing hardware.
Gemopus presents two versions: Gemopus-4-26B-A4B and Gemopus-4-E4B. The former is a Mixture of Experts model with 26 billion parameters but only activates around 4 billion during inference, optimizing performance on limited hardware. Parameters are crucial as they define the AI’s ability to learn, reason, and retain information. By activating just the necessary 4 billion parameters for specific prompts, it achieves high-quality output akin to a large-scale AI while maintaining efficiency.
The lighter Gemopus-4-E4B variant is designed for edge devices like modern iPhones or lightweight MacBooks without requiring a GPU.
Gemma 4, released on April 2, shares its research and technology with Gemini 3. Hence, Gemopus inherits the DNA of Google’s proprietary state-of-the-art model along with Anthropic’s reasoning style—something Qwen-based models cannot claim.
What distinguishes Gemopus from other Gemma fine-tunes is Jackrong’s philosophy. Unlike others who embed Claude’s chain-of-thought reasoning directly into Gemma’s weights, he focused on enhancing answer quality, structure, and conversational clarity to address Gemma’s rigid tone.
Kyle Hessling, an AI infrastructure engineer, benchmarked the 26B variant favorably. On X, he commended it for handling one-shot requests over extensive contexts swiftly, attributing its speed to its MOE architecture.
The larger model performs well across various tests: passing all 14 core competence and 12 long-context evaluations with ease. It excels at extended retrieval tasks up to a million tokens. On edge devices, the E4B variant delivers impressive speeds—45-60 tokens per second on iPhone 17 Pro Max and 90-120 tokens per second on MacBook Air M3/M4 via MLX.
Both models are accessible in GGUF format for seamless integration into LM Studio or llama.cpp without additional setup. Jackrong’s GitHub offers the full training code and a comprehensive fine-tuning guide, mirroring his Qwopus process using Unsloth and LoRA configurations.
Despite its strengths, Gemopus has limitations: tool calling issues persist across llama.cpp and LM Studio versions of Gemma 4 models. Jackrong labels this as an engineering exploration rather than a production-ready solution, suggesting his Qwopus 3.5 series for more stable needs.
Jackrong consciously avoided deep Claude-style reasoning to maintain stability, as forcing these traces could destabilize Gemma models—a sentiment echoed by Hessling. Meanwhile, another community project, Ornstein, aims at enhancing Gemma’s reasoning without relying on third-party logic or style.
While fine-tuning Gemma is more challenging than Qwen due to greater loss fluctuations and sensitivity, Jackrong suggests that for those seeking a robust local model, Qwopus 3.5 remains the go-to option. However, for an American model with Opus-like refinement, Gemopus stands out as the top choice. A denser 31B version is also anticipated.
For enthusiasts interested in running AI models on their hardware, a guide to starting with local AI is available.