[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"tag-llamacpp":3},{"tag":4,"articles":11},{"id":5,"name":6,"slug":7,"article_count":8,"description_zh":9,"description_en":10},"d7a2807c-2270-4884-8b44-f0ffccfd73a8","llama.cpp","llamacpp",3,"llama.cpp 是把大型語言模型帶到本機與邊緣裝置的推論框架，重點在低記憶體占用、量化、KV cache 管理與啟動速度。相關議題常延伸到 GPU\u002FCPU 混合推論、Rust\u002FCUDA 整合，以及多模態與微調工具鏈的相容性。","llama.cpp is a local inference stack for running LLMs on CPUs, GPUs, and edge devices with tight memory budgets. The topic often covers quantization, KV cache optimization, cold-start latency, and how it fits into fine-tuning and multimodal workflows.",[12,21,29,36,43],{"id":13,"slug":14,"title":15,"summary":16,"category":17,"image_url":18,"cover_image":18,"language":19,"created_at":20},"0e767e9d-5d17-4cd0-b6ee-0328f89eb49b","gemma-4-12b-specs-benchmarks-run-locally-en","Gemma 4 12B: Specs, Benchmarks & How to Run It Locally","Gemma 4 12B is a local-first multimodal model you can run on a 16 GB machine.","model-release","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1780777984661-5ymr.png","en","2026-06-06T20:32:25.294996+00:00",{"id":22,"slug":23,"title":24,"summary":25,"category":26,"image_url":27,"cover_image":27,"language":19,"created_at":28},"a7daef63-2e7d-4942-8bc1-7ebbe31ebb52","why-llama-cpp-release-notes-matter-more-than-bragging-en","Why llama.cpp’s release notes matter more than its model bragging","llama.cpp’s latest releases show that backend correctness drives real speed gains.","tools","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1779769553066-1mx4.png","2026-05-26T04:25:24.65574+00:00",{"id":30,"slug":31,"title":32,"summary":33,"category":26,"image_url":34,"cover_image":34,"language":19,"created_at":35},"8a164bd6-6f92-47a6-87fb-72a6371aae17","why-llama-cpp-should-treat-turboquant-as-default-en","Why llama.cpp should treat TurboQuant as the new default path","TurboQuant is the right direction for llama.cpp because asymmetric KV compression cuts memory without breaking compatibility.","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1779481556833-a9v3.png","2026-05-22T20:25:23.12744+00:00",{"id":37,"slug":38,"title":39,"summary":40,"category":26,"image_url":41,"cover_image":41,"language":19,"created_at":42},"5ed4267c-b54b-4c73-8192-79bfacaf438d","llama-cpp-local-llm-inference-cpp-en","llama.cpp adds local LLM inference in C\u002FC++","ggml-org’s llama.cpp keeps expanding local LLM support with OpenAI-compatible serving, browser WebGPU, and broad hardware backends.","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1779480952129-tpau.png","2026-05-22T20:15:28.848286+00:00",{"id":44,"slug":45,"title":46,"summary":47,"category":48,"image_url":49,"cover_image":49,"language":19,"created_at":50},"bfbd028b-4704-4de5-8f54-55625836952f","5-kv-cache-takeaways-for-llamacpp-users-en","5 KV cache takeaways for llama.cpp users","5 takeaways from TurboQuant: under-3-bit KV cache compression, memory savings, and the tradeoffs llama.cpp users should watch.","industry","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1779285258553-domr.png","2026-05-20T13:53:43.522918+00:00"]