Tag
chain-of-thought
Chain-of-thought focuses on how models connect intermediate reasoning steps, not just final answers. It includes long-horizon benchmarks, agent loops, structured outputs, and stability under long context, all of which matter when evaluating and deploying LLMs.
2 articles

Research/Apr 16
LongCoT Benchmark: 2,500-Probl. Long-Horizon Reasoning
LongCoT is a 2,500-problem benchmark for measuring whether frontier models can sustain long, interdependent reasoning chains.

AI Agent/Apr 3
Prompt Engineering for Agents and Structured Outputs
Prompt engineering gets harder in production: reasoning, long contexts, JSON contracts, and agent loops all need different prompt tactics.