[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"tag-inference":3},{"tag":4,"articles":10},{"id":5,"name":6,"slug":6,"article_count":7,"description_zh":8,"description_en":9},"0750e826-30ea-499e-858d-2c46a7bfe1fb","inference",6,"Inference 指的是模型在部署後進行推理與生成的階段，牽涉延遲、吞吐量、GPU 排程、記憶體壓縮與成本控制。從 Kubernetes AI 控制平面到量化與 TensorRT-LLM，這是 AI 走向生產環境的核心層。","Inference is the production stage where models serve predictions or generate outputs, so latency, throughput, GPU scheduling, memory footprint, and cost all matter. Recent work spans Kubernetes as an AI control plane, quantization, and TensorRT-LLM optimizations.",[11,20,27,35],{"id":12,"slug":13,"title":14,"summary":15,"category":16,"image_url":17,"cover_image":17,"language":18,"created_at":19},"551703cb-117b-45e6-98d0-3f0dfe16e086","ae-llm-adaptive-efficiency-optimization-en","AE-LLM aims to make LLMs more efficient","AE-LLM proposes adaptive efficiency optimization for large language models, but the provided source does not include benchmark details.","research","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778051450895-f6re.png","en","2026-05-06T07:10:33.795652+00:00",{"id":21,"slug":22,"title":23,"summary":24,"category":16,"image_url":25,"cover_image":25,"language":18,"created_at":26},"a15782d7-4678-4415-9a0b-4c642e46b022","nvidia-mlperf-software-inference-benchmarks-en","Nvidia’s MLPerf Gains Show Software Still Matters","Nvidia posted up to 2.77x MLPerf gains on GB300 NVL72, with software tricks like Dynamo and TensorRT-LLM doing heavy lifting.","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1775185791842-obyu.png","2026-04-03T03:09:35.154603+00:00",{"id":28,"slug":29,"title":30,"summary":31,"category":32,"image_url":33,"cover_image":33,"language":18,"created_at":34},"ebda74d3-8122-455a-addd-1ade341b2542","kubernetes-becoming-ais-control-plane-en","Kubernetes Is Becoming AI’s Control Plane","KubeCon Europe 2026 showed Kubernetes moving from app orchestration to AI ops, with inference, GPUs, and open standards leading the shift.","industry","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1775178585987-q0fi.png","2026-04-03T01:09:30.778998+00:00",{"id":36,"slug":37,"title":38,"summary":39,"category":32,"image_url":40,"cover_image":40,"language":18,"created_at":41},"15c2f00f-4c48-4580-a13e-74626eb520f7","five-ai-infra-frontiers-bessemer-2026-en","Five AI Infra Frontiers Bessemer Expects for 2026","Bessemer’s 2026 AI infra roadmap points to memory, continual learning, RL, inference, and world models as the next big build areas.","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1775164380914-xfye.png","2026-04-02T21:12:40.223864+00:00"]