AI Engineer Crafts Superior Model by Merging Claude Opus, GLM, and Qwen

Kyle Hessling, an AI engineer with considerable expertise, has taken the concept of Qwopus—a fusion of Qwen and Opus—to a new level. By incorporating GLM, renowned for its exceptional reasoning capabilities, he created an 18 billion parameter model that surpasses Alibaba’s latest 35B model while fitting comfortably on a cost-effective GPU.

Parameters in neural networks are akin to adjustable settings that determine the model’s capacity for knowledge and complexity; more parameters necessitate increased memory. Hessling, who specializes in AI infrastructure, merged two finetunes of Jackrong’s Qwen3.5: layers 0-31 from Qwopus 3.5-9B-v3.5, which infuses Claude 4.6 Opus’s reasoning into a Qwen base model, and layers 32-63 from Qwen 3.5-9B-GLM5.1-Distill-v1, trained on GLM-5.1 data atop the same Qwen base.

The concept was to utilize Opus-style structured planning in the initial half of the reasoning process while employing GLM’s problem decomposition framework in the latter, totaling 64 layers within a single model. The method employed is known as passthrough frankenmerge—merely stacking layers without blending or averaging weights. Hessling developed a custom merge script due to the lack of support for Qwen 3.5’s hybrid linear/full attention architecture in existing tools.

This new model excelled by passing 40 out of 44 capability tests, outperforming Alibaba’s Qwen 3.6-35B-A3B MoE, which demands 22 GB of VRAM, while operating on just 9.2 GB using Q4_K_M quantization—an NVIDIA RTX 3060 could theoretically handle it.

Crafting the model was not without challenges; initial attempts produced garbled code. Despite this, Hessling’s models gained popularity among AI enthusiasts. A “heal fine-tune,” akin to embedding a conditioning layer (QLoRA) targeting all attention and projections, was his final solution.

Testing revealed that while hosting Qwen, Claude Opus, and GLM 5.1 locally on consumer hardware is appealing, the model’s proficiency in reasoning can lead to overthinking. On an M1 MacBook running an MLX quantized version (optimized for Macs), a standard test prompt resulted in extensive reasoning without reaching a conclusion due to token limits—a significant issue for practical use.

Even with simplified prompts, such as “write a Snake game,” the model took over 40 minutes to reason. The results are documented on our GitHub repository.

This behavior aligns with known issues within Qwopus: Jackrong’s v2 finetunes were designed to mitigate Qwen 3.5’s repetitive loops and promote efficiency. However, stacking 64 layers of reasoning distills seems to exacerbate this tendency for specific prompts—a challenge that the open-source community is likely to address.

The broader significance lies in the pattern: a developer publishes specialized finetunes with detailed guides, another enthusiast combines them using custom scripts, and through extensive fine-tuning, achieves a model outperforming a 35 billion parameter release from a major AI lab—all within a compact file size. This underscores the value of open-source development—not just for large labs but also for community-driven innovations.

Jackrong has mirrored Hessling’s repository, which amassed over three thousand downloads in its first two weeks. The shrinking gap between weekend projects and frontier deployments highlights the growing impact of collaborative development.

Platform Hexoria 24 officieel vertrouwd platform voor AI-handel