[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"article-mempalace-100-percent-claim-scrutiny-en":3,"tags-mempalace-100-percent-claim-scrutiny-en":30,"related-lang-mempalace-100-percent-claim-scrutiny-en":41,"related-posts-mempalace-100-percent-claim-scrutiny-en":45,"series-ai-agent-406aeb11-f9c0-428a-9160-e983f7977d2e":82},{"id":4,"title":5,"content":6,"summary":7,"source":8,"source_url":9,"author":10,"image_url":11,"keywords":12,"language":18,"translated_content":10,"views":19,"is_premium":20,"created_at":21,"updated_at":21,"cover_image":11,"published_at":22,"rewrite_status":23,"rewrite_error":10,"rewritten_from_id":24,"slug":25,"category":26,"related_article_id":27,"status":28,"google_indexed_at":29,"x_posted_at":10,"tweet_text":10,"title_rewritten_at":10,"title_original":10,"key_takeaways":10,"topic_cluster_id":10,"embedding":10,"is_canonical_seed":20},"406aeb11-f9c0-428a-9160-e983f7977d2e","MemPalace’s 100% memory claim gets checked","\u003Cp>\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fmilla-jovovich\u002Fmempalace\" target=\"_blank\" rel=\"noopener\">MemPalace\u003C\u002Fa> pulled in more than 11,000 GitHub stars in 48 hours, which is a strong signal that people want better AI memory tools. The headline claim was even louder: a perfect 100% score on \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fucsd-ml\u002FLongMemEval\" target=\"_blank\" rel=\"noopener\">LongMemEval\u003C\u002Fa>, a benchmark built to test long-term memory in AI systems. Independent checks later cut that number down to 84.2% once compression was actually enabled.\u003C\u002Fp>\u003Cp>That gap matters because MemPalace is still interesting even after the hype gets trimmed. It is a local-first memory system with an MCP server, 19 tools, and an offline design built around a spatial “memory palace” model. The project looks useful. The scorecard around it needs a lot more honesty.\u003C\u002Fp>\u003Ch2>What MemPalace actually is\u003C\u002Fh2>\u003Cp>MemPalace is built around a simple idea: instead of storing chat history as one long flat log, it organizes memory into wings, halls, and rooms. That structure borrows from the ancient “method of loci,” where people remember facts by placing them in a mental building they can walk through.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1775579093830-39tn.png\" alt=\"MemPalace’s 100% memory claim gets checked\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>The project uses \u003Ca href=\"https:\u002F\u002Fwww.trychroma.com\u002F\" target=\"_blank\" rel=\"noopener\">ChromaDB\u003C\u002Fa> for retrieval and \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fyaml\u002Fpyyaml\" target=\"_blank\" rel=\"noopener\">PyYAML\u003C\u002Fa> for configuration and metadata handling. It also ships with an MCP server, which means it can plug into tools that speak the \u003Ca href=\"https:\u002F\u002Fmodelcontextprotocol.io\u002F\" target=\"_blank\" rel=\"noopener\">Model Context Protocol\u003C\u002Fa>, including assistants like Claude and editors like Cursor.\u003C\u002Fp>\u003Cp>That offline-first angle is a big part of the appeal. A lot of AI memory products send data to a cloud service by default, which is fine for convenience and bad for anyone who wants local control. MemPalace keeps the memory store on the machine.\u003C\u002Fp>\u003Cul>\u003Cli>GitHub stars in 48 hours: 11,000+\u003C\u002Fli>\u003Cli>Claimed LongMemEval score: 100% or 500\u002F500\u003C\u002Fli>\u003Cli>Verified compressed-mode score: 84.2%\u003C\u002Fli>\u003Cli>Raw retrieval score cited by reviewers: 96.6% R@5\u003C\u002Fli>\u003Cli>MCP tools included: 19\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>Why the benchmark claim broke down\u003C\u002Fh2>\u003Cp>LongMemEval is a real benchmark from UC San Diego that tests five long-term memory abilities across 500 questions. That gives the MemPalace story some real weight, because the benchmark is not made up and the task is hard enough to matter.\u003C\u002Fp>\u003Cp>The problem is how the perfect score was presented. Independent reviewers found that the 100% result came after hand-patching the last three wrong answers and then rerunning the same dataset. That is classic overfitting to the test set. It may improve the demo, but it does not prove the system generalizes.\u003C\u002Fp>\u003Cblockquote>“The first principle is that you must not fool yourself and you are the easiest person to fool.” — Richard Feynman\u003C\u002Fblockquote>\u003Cp>That quote fits this story well. A benchmark can be real, a repo can be real, and the result can still be misleading if the evaluation is massaged after the fact. In AI tooling, the difference between “works in a demo” and “works in the wild” is often the entire story.\u003C\u002Fp>\u003Cp>There is also a technical mismatch in the numbers. The 96.6% R@5 figure appears to come from ChromaDB’s default embedding retrieval, which means it measures nearest-neighbor lookup rather than the palace structure or the custom AAAK compression system. Once AAAK compression is actually used, the score drops to 84.2%, which directly contradicts the “lossless” framing.\u003C\u002Fp>\u003Ch2>How MemPalace compares with other systems\u003C\u002Fh2>\u003Cp>MemPalace is not alone in chasing long-term memory benchmarks. The interesting part is that once you line up the numbers, the project is competitive without needing a perfect score.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1775579100406-k36z.png\" alt=\"MemPalace’s 100% memory claim gets checked\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>Here is the cleaner comparison based on published figures and reviewer notes:\u003C\u002Fp>\u003Cul>\u003Cli>\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fmastra-ai\u002Fmastra\" target=\"_blank\" rel=\"noopener\">Mastra\u003C\u002Fa>: 94.87% on LongMemEval\u003C\u002Fli>\u003Cli>\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fomega-memory\u002Fomega\" target=\"_blank\" rel=\"noopener\">OMEGA\u003C\u002Fa>: 95.4%\u003C\u002Fli>\u003Cli>\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fagentmemory\u002Fagentmemory\" target=\"_blank\" rel=\"noopener\">agentmemory\u003C\u002Fa>: 96.2%\u003C\u002Fli>\u003Cli>MemPalace raw retrieval: 96.6% R@5\u003C\u002Fli>\u003Cli>MemPalace with AAAK compression: 84.2%\u003C\u002Fli>\u003C\u002Ful>\u003Cp>Those numbers tell a useful story. MemPalace is in the conversation on retrieval quality, but the headline claim was doing more work than the actual system. If you strip away the perfect-score language, you are left with a solid local memory prototype that may be useful for people building agents, not a miracle benchmark winner.\u003C\u002Fp>\u003Cp>The other comparison that matters is architectural. Most memory systems are just databases with some heuristics layered on top. MemPalace tries to mirror human recall with spatial organization, which is a more interesting design choice than a plain vector store. That does not make it better by default, but it does make it worth testing beyond one benchmark.\u003C\u002Fp>\u003Ch2>Milla Jovovich’s role and what the project proves\u003C\u002Fh2>\u003Cp>Milla Jovovich’s involvement is real. Her verified Instagram account and the GitHub history point to genuine participation, and her GitHub bio describes her as the “architect of the MemPalace.” That wording matters. It suggests direction and product vision, not a claim that she wrote every line herself.\u003C\u002Fp>\u003Cp>Ben Sigman’s posts also hint at how the project came together. He said they created it with Claude, and his joke about “Multipass” makes it pretty clear that \u003Ca href=\"https:\u002F\u002Fwww.anthropic.com\u002Fclaude-code\" target=\"_blank\" rel=\"noopener\">Claude Code\u003C\u002Fa> did a lot of the implementation work. That is not a knock on the project. It is a sign of where AI-assisted development is now.\u003C\u002Fp>\u003Cp>What MemPalace proves is narrower than the headlines suggest. A public figure with no known programming background can still help produce a functional AI tool in a few months if the workflow is good and the model does much of the coding. That is a real shift in who can ship software, and it is more interesting than the benchmark drama.\u003C\u002Fp>\u003Cp>It also exposes a second lesson: if a project is genuinely useful, it does not need inflated numbers. The local-first setup, MCP support, and memory-palace interface are all strong ideas on their own. The 100% claim only made the story louder, and then it made the scrutiny harsher.\u003C\u002Fp>\u003Ch2>What developers should take from this\u003C\u002Fh2>\u003Cp>If you build AI agents, MemPalace is worth studying for the design, not the headline. The spatial memory model could be a better mental fit for some workflows than a flat recall stack, especially when users need to inspect, edit, or prune memories by topic or time.\u003C\u002Fp>\u003Cp>The benchmark lesson is even more practical. If your evaluation can be nudged upward by patching a few answers and rerunning the same set, then the score is marketing, not evidence. That rule applies whether you are shipping an open-source repo, a startup demo, or an internal prototype.\u003C\u002Fp>\u003Cp>My read is simple: MemPalace should be remembered as a strong prototype wrapped in weak claims. The project may keep climbing because the idea is useful and the celebrity angle gives it reach, but the real test now is whether other builders can reproduce the architecture, stress it with fresh data, and keep the score honest.\u003C\u002Fp>\u003Cp>That is the question worth watching next: can MemPalace hold up when people stop talking about 100% and start asking how it performs on new memory tasks, new models, and real user data?\u003C\u002Fp>","MemPalace hit 11K GitHub stars fast, but its 100% LongMemEval claim fell to 84.2% under compression. The project is real; the marketing isn’t.","oracore-original","https:\u002F\u002Fgithub.com\u002Fmilla-jovovich\u002Fmempalace",null,"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1775579093830-39tn.png",[13,14,15,16,17],"MemPalace","LongMemEval","AI memory","MCP","Claude Code","en",0,false,"2026-04-07T16:24:37.123273+00:00","2026-04-07T16:24:36.963+00:00","done","58d0006b-cff1-4fa5-86f0-47b1d08a7741","mempalace-100-percent-claim-scrutiny-en","ai-agent","4f005451-c02c-4f08-90c5-c1f94b5c374a","published","2026-04-08T09:00:47.978+00:00",[31,33,35,37,39],{"name":16,"slug":32},"mcp",{"name":14,"slug":34},"longmemeval",{"name":17,"slug":36},"claude-code",{"name":13,"slug":38},"mempalace",{"name":15,"slug":40},"ai-memory",{"id":27,"slug":42,"title":43,"language":44},"mempalace-100-percent-claim-scrutiny-zh","MemPalace 的 100% 記憶宣稱被拆穿","zh",[46,52,58,64,70,76],{"id":47,"slug":48,"title":49,"cover_image":50,"image_url":50,"created_at":51,"category":26},"c5d4bc11-1f4d-438c-b644-a8498826e1ab","claude-agent-dreaming-outcomes-multiagent-en","Claude给Agent加了“做梦”功能","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778868649463-f5qv.png","2026-05-15T18:10:25.29539+00:00",{"id":53,"slug":54,"title":55,"cover_image":56,"image_url":56,"created_at":57,"category":26},"fda44d24-7baf-4d91-a7f9-bbfecae20a27","switch-ai-outputs-markdown-to-html-en","How to Switch AI Outputs from Markdown to HTML","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778743249827-wmsr.png","2026-05-14T07:20:22.631724+00:00",{"id":59,"slug":60,"title":61,"cover_image":62,"image_url":62,"created_at":63,"category":26},"064275f5-4282-47c3-8e4a-60fe8ac99246","anthropic-cat-wu-proactive-ai-assistants-en","Anthropic’s Cat Wu on proactive AI assistants","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778735465548-a92i.png","2026-05-14T05:10:31.723441+00:00",{"id":65,"slug":66,"title":67,"cover_image":68,"image_url":68,"created_at":69,"category":26},"423ac8ad-2886-42a9-8dd8-78e5d43a1574","how-to-run-hermes-agent-on-discord-en","How to Run Hermes Agent on Discord","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778724656141-i30t.png","2026-05-14T02:10:35.727086+00:00",{"id":71,"slug":72,"title":73,"cover_image":74,"image_url":74,"created_at":75,"category":26},"776a562c-99a6-4a6b-93a0-9af40300f3f2","why-ragflow-is-the-right-open-source-rag-engine-to-self-host-en","Why RAGFlow is the right open-source RAG engine to self-host","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778674254587-0pxn.png","2026-05-13T12:10:25.721583+00:00",{"id":77,"slug":78,"title":79,"cover_image":80,"image_url":80,"created_at":81,"category":26},"322ec8bc-61d3-4c80-bb9e-a19941e137c6","how-to-add-temporal-rag-in-production-en","How to Add Temporal RAG in Production","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778667085221-0mox.png","2026-05-13T10:10:31.619892+00:00",[83,88,93,98,103,108,113,118,123,128],{"id":84,"slug":85,"title":86,"created_at":87},"03db8de8-8dc2-4ac1-9cf7-898782efbb1f","anthropic-claude-ai-agent-task-automation-en","Anthropic's Claude AI Agent: A New Era of Task Automation","2026-03-25T16:25:06.513026+00:00",{"id":89,"slug":90,"title":91,"created_at":92},"045d1abc-190d-4594-8c95-91e2a26f0c5a","googles-2026-ai-agent-report-decoded-en","Google’s 2026 AI Agent Report, Decoded","2026-03-26T11:15:23.046616+00:00",{"id":94,"slug":95,"title":96,"created_at":97},"e64aba21-254b-4f93-aa21-837484bb52ec","kimi-k25-review-stronger-still-not-legend-en","Kimi K2.5 review: stronger, still not a legend","2026-03-27T07:15:55.385951+00:00",{"id":99,"slug":100,"title":101,"created_at":102},"30dfb781-a1b2-4add-aebe-b3df40247c37","claude-code-controls-mac-desktop-en","Claude Code now controls your Mac desktop","2026-03-28T03:01:59.384091+00:00",{"id":104,"slug":105,"title":106,"created_at":107},"254405b6-7833-4800-8e13-f5196deefbe6","cloudflare-100x-faster-ai-agent-sandbox-en","Cloudflare’s 100x Faster AI Agent Sandbox","2026-03-28T03:09:44.356437+00:00",{"id":109,"slug":110,"title":111,"created_at":112},"04f29b7f-9b91-4306-89a7-97d725e6e1ba","openai-backs-isara-agent-swarm-bet-en","OpenAI backs Isara’s agent-swarm bet","2026-03-28T03:15:27.849766+00:00",{"id":114,"slug":115,"title":116,"created_at":117},"3b0bf479-e4ae-4703-9666-721a7e0cdb91","openai-plan-automated-ai-researcher-en","OpenAI’s plan for an automated AI researcher","2026-03-28T03:17:42.312819+00:00",{"id":119,"slug":120,"title":121,"created_at":122},"fe91bce0-b85d-4efa-a207-24ae9939c29f","harness-engineering-ai-agent-reliability-2026","Harness Engineering: From Bridle to Operating System, The Missing Link in AI Agent Reliability","2026-03-31T06:36:55.648751+00:00",{"id":124,"slug":125,"title":126,"created_at":127},"67dc66da-ca46-4aa5-970b-e997a39fe109","openai-codex-plugin-claude-code-en","OpenAI puts Codex inside Claude Code","2026-04-01T09:21:55.381386+00:00",{"id":129,"slug":130,"title":131,"created_at":132},"7a09007d-820f-43b3-8607-8ad1bfcb94c8","mcp-explained-from-prompts-to-production-en","MCP Explained: From Prompts to Production","2026-04-01T09:24:40.089177+00:00"]