[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"tag-llm-inference":3},{"tag":4,"articles":11},{"id":5,"name":6,"slug":7,"article_count":8,"description_zh":9,"description_en":10},"a487ff8b-bc7c-473d-b9f2-867dd22c9327","LLM inference","llm-inference",4,"LLM 推論聚焦模型在部署時的延遲、吞吐量與記憶體成本，尤其是 KV cache、量化與加速器友善的實作。這類技術直接影響大模型能否在雲端與邊緣裝置上穩定運行。","LLM inference covers the runtime side of large models: latency, throughput, memory footprint, and how KV cache, quantization, and accelerator-friendly kernels shape deployment. It matters because these choices determine whether a model is practical on GPUs, servers, or edge devices.",[12,21,28],{"id":13,"slug":14,"title":15,"summary":16,"category":17,"image_url":18,"cover_image":18,"language":19,"created_at":20},"941f698a-1dcf-4807-bd56-5295c07d2dee","taming-black-box-llm-inference-scheduling-zh","黑箱 LLM 排程更聰明了","這篇論文用「預測輸出長度」來改善黑箱 LLM 推論排程，想在看不到模型內部的情況下，減少排隊摩擦、提升大規模服務效率。","research","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778740253221-wgy6.png","zh","2026-05-14T06:30:31.546746+00:00",{"id":22,"slug":23,"title":24,"summary":25,"category":17,"image_url":26,"cover_image":26,"language":19,"created_at":27},"db0d0cbe-b1ba-4f1e-9569-f902e41bb3b0","saga-workflow-atomic-scheduling-gpu-clusters-zh","SAGA 讓 AI Agent 排程看懂工作流","SAGA 主張 GPU 排程不該把 AI agent 的每次 LLM 呼叫拆開看，而是要把一連串請求當成同一個工作流來排。","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778567467043-imbu.png","2026-05-12T06:30:31.788116+00:00",{"id":29,"slug":30,"title":31,"summary":32,"category":17,"image_url":33,"cover_image":33,"language":19,"created_at":34},"13197f11-d68b-468c-aa9f-9e84b85673d2","speckv-adaptive-speculative-decoding-gamma-zh","SpecKV 讓推測解碼自動調 gamma","SpecKV 把推測解碼的 token 預算改成逐步自動調整，利用 draft 模型訊號在不同壓縮設定下挑出更合適的 gamma。","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1777961462925-xmg2.png","2026-05-05T06:10:32.259958+00:00"]