AI agent papers worth tracking in one repo
A curated repo of 4 agent paper themes helps you find planning, skills, harnesses, and surveys fast.

This repo curates AI agent papers by theme so you can scan the field fast.
This GitHub collection tracks AI agent research in themed buckets and updates it biweekly, with 1,494 stars showing strong community use. If you want a fast way to follow planning, skills, harnesses, and surveys without reading every arXiv feed, this list shows where to start.
| Item | What it covers | Example signals |
|---|---|---|
| Harness | Runtime structure for agent execution | Safety, search, production workflows |
| Skills | Reusable agent abilities | Skill creation, governance, evaluation |
| Survey | Field overviews | Taxonomy, trends, benchmarks |
| Architecture | How agents are organized | Single-agent, multi-agent, ops |
| Applications | Where agents are used | Web, software, data, research |
1. Harness papers for runtime design
Get the latest AI news in your inbox
Weekly picks of model releases, tools, and deep dives — no spam, unsubscribe anytime.
No spam. Unsubscribe at any time.
The harness section is the best entry point if you care about how agents actually run in production. It gathers papers on execution substrates, safety checks, search behavior, and architecture patterns, which makes it useful for builders who need more than model prompts.

Representative papers in this bucket include AI Harness Engineering: A Runtime Substrate for Foundation-Model Software Agents, Is Grep All You Need? How Agent Harnesses Reshape Agentic Search, and Harness-Bench: Measuring Harness Effects across Models in Realistic Agent Workflows.
- Focuses on execution, not just prompting
- Useful for agent ops, evaluation, and safety work
- Includes survey and benchmark entries
2. Skills papers for reusable agent abilities
If your interest is what agents can learn to do repeatedly, the skills section is the most practical cluster. It covers skill creation, selection, governance, and self-evolution, so you can compare papers that treat skills as modular parts of an agent system.
That makes it a strong fit for teams building long-lived agents. Papers such as SkillOS: Learning Skill Curation for Self-Evolving Agents, SkillsVote: Lifecycle Governance of Agent Skills from Collection, Recommendation to Evolution, and SkillGrad: Optimizing Agent Skills Like Gradient Descent show how broad the topic has become.
Skill themes you will see here:
- skill generation
- skill memory and management
- least-privilege enforcement
- skill evaluation
- self-evolving skill systems
3. Survey papers for fast field orientation
The survey bucket is the quickest way to understand where the research is going. Instead of one method, these papers map taxonomies, techniques, and open questions, which is helpful when you need a clean overview before choosing a subtopic.

For a broad starting point, A Comprehensive Survey on Agent Skills: Taxonomy, Techniques, and Applications and Generate, Filter, Control, Replay: A Comprehensive Survey of Rollout Strategies for LLM Reinforcement Learning show the repository’s survey style. The collection also points to related work on collaboration, failure attribution, and self-evaluation.
- Good for literature reviews and slide prep
- Helps identify subfields worth deeper reading
- Pairs well with benchmark papers
4. Architecture papers for agent system design
The architecture section organizes papers around single-agent, multi-agent, and agent-ops patterns. That is useful if you are deciding how to structure a product, because the papers here are about system shape as much as model behavior.
Use this section when you need to compare coordination styles or operational patterns. The repo’s links make it easy to jump from broad architecture choices to more specific application areas like digital agents or enterprise agents.
- Single-agent setups for focused tasks
- Multi-agent setups for coordination and division of labor
- Agent-ops and UX for production deployment
5. Application papers for domain-specific use cases
The application sections are where the repository becomes especially useful for practitioners. Instead of staying abstract, it sorts papers into embodied, web, mobile, software, data, research, API, deep research, enterprise, and finance agents.
That lets you jump straight to the environment you care about. If you are building a browser worker, a coding assistant, or a research copilot, the application pages narrow the reading list quickly and reduce time spent on irrelevant papers.
Examples of application clusters:
- Web agents
- GUI agents
- Software agents
- Research agents
- Enterprise agents
How to decide
Pick harness papers if you care about execution and safety, skills papers if you want reusable capabilities, surveys if you need orientation, and architecture or application papers if you are building a system for a specific setting. For most readers, the fastest path is survey first, then harness or skills, then the application area that matches the product.
Because the repo is updated biweekly, it works well as a living reading list rather than a one-time roundup.
// Related Articles
- [IND]
Anthropic’s Fable shows AI can outsmart constraints
- [IND]
OpenAI’s partner network is a delivery strategy, not a logo program
- [IND]
The Anthropic ban proves Congress should regulate frontier AI now
- [IND]
Anthropic’s safe Claude Mythos 5 turns access into tiers
- [IND]
G7 should treat AI CEOs as power brokers, not guests
- [IND]
KuCoin’s AI stack turns blockchain into AI plumbing