Tag

mixture-of-experts

3 articles

UniPool shares MoE experts across layers

UniPool replaces per-layer MoE experts with one shared pool, cutting redundancy and improving validation loss in five LLaMA-scale models.

Raschka’s gallery compares GPT-2, Llama 3, OLMo 2, DeepSeek, and Qwen stacks with exact layer, cache, and attention data.

Cursor’s Composer 2 posts 61.3 on CursorBench and 61.7 on Terminal-Bench 2.0, with pricing aimed at high-volume coding teams.