Tag
mixture-of-experts
3 articles

Research/May 8
UniPool shares MoE experts across layers
UniPool replaces per-layer MoE experts with one shared pool, cutting redundancy and improving validation loss in five LLaMA-scale models.

Research/Apr 2
Sebastian Raschka’s LLM Architecture Gallery
Raschka’s gallery compares GPT-2, Llama 3, OLMo 2, DeepSeek, and Qwen stacks with exact layer, cache, and attention data.

Model Releases/Mar 28
Cursor Composer 2 Bets on Agentic Coding
Cursor’s Composer 2 posts 61.3 on CursorBench and 61.7 on Terminal-Bench 2.0, with pricing aimed at high-volume coding teams.