[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"tag-ai-inference":3},{"tag":4,"articles":11},{"id":5,"name":6,"slug":7,"article_count":8,"description_zh":9,"description_en":10},"fc38f553-fad1-4b1d-ae44-c11e20579e1d","AI inference","ai-inference",4,"AI inference 指模型在部署後進行即時推論的過程，重點在延遲、記憶體與算力成本。從住宅型節點、KV cache 壓縮到長上下文下的 DRAM 壓力，都直接影響雲端與邊緣部署的經濟性。","AI inference is the runtime phase where trained models generate outputs in production, so latency, memory footprint, and compute cost matter most. Topics here include home-based inference nodes, KV-cache compression, and how long contexts keep DRAM demand high.",[12,21,28,36],{"id":13,"slug":14,"title":15,"summary":16,"category":17,"image_url":18,"cover_image":18,"language":19,"created_at":20},"d86c3629-13a2-414b-8219-ec4f2d17e1c4","why-zyphra-cloud-on-amd-matters-en","Why Zyphra Cloud on AMD Matters More Than Another Model Launch","Zyphra Cloud matters because inference, not training, is now the real AI platform battle.","industry","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778692868497-kxiy.png","en","2026-05-13T17:20:30.992784+00:00",{"id":22,"slug":23,"title":24,"summary":25,"category":17,"image_url":26,"cover_image":26,"language":19,"created_at":27},"66640415-f9bb-4444-b39f-de18b15b0431","spans-mini-ai-data-centers-move-into-homes-en","Span, Nvidia, Pulte: Mini AI Data Centers in Homes","Span is testing home-based AI inference nodes with 1.25 MW across 100 homes, cutting build time from years to months.","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1776341402261-q217.png","2026-04-16T12:09:39.105935+00:00",{"id":29,"slug":30,"title":31,"summary":32,"category":33,"image_url":34,"cover_image":34,"language":19,"created_at":35},"6c80feee-7f7d-4518-bd06-3c04b8c46054","turboquant-cuts-memory-use-without-accuracy-loss-en","TurboQuant cuts memory use 6x without accuracy loss","Google Research’s TurboQuant claims 6x less memory and 8x faster inference with no accuracy loss, jolting AI inference economics.","research","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1775161136573-e0cb.png","2026-04-02T20:18:39.999171+00:00",{"id":37,"slug":38,"title":39,"summary":40,"category":33,"image_url":41,"cover_image":41,"language":19,"created_at":42},"d4867ede-353b-4812-aac7-aebe28ef3613","turboquant-wont-fix-memory-crunch-en","TurboQuant Won’t Fix the Memory Crunch","Google’s TurboQuant can cut KV-cache memory use 6x, but longer contexts may keep DRAM and NAND demand climbing.","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1775132152400-1kew.png","2026-04-02T12:15:32.095995+00:00"]