[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"article-5-llm-benchmarks-for-business-buyers-2026-zh":3,"article-related-5-llm-benchmarks-for-business-buyers-2026-zh":38,"series-industry-a7bca854-a4d9-4616-b651-e5d732a63255":91},{"id":4,"title":5,"content":6,"summary":7,"source":8,"source_url":9,"author":10,"image_url":11,"keywords":12,"language":21,"translated_content":10,"views":22,"is_premium":23,"created_at":24,"updated_at":24,"cover_image":11,"published_at":25,"rewrite_status":26,"rewrite_error":10,"rewritten_from_id":27,"slug":28,"category":29,"related_article_id":30,"status":31,"google_indexed_at":10,"x_posted_at":10,"tweet_text":10,"title_rewritten_at":10,"title_original":10,"key_takeaways":32,"topic_cluster_id":36,"embedding":37,"is_canonical_seed":23},"a7bca854-a4d9-4616-b651-e5d732a63255","5 個 LLM 基準測試","\u003Cp data-speakable=\"summary\">這篇整理 5 個 LLM 基準測試，幫你判斷模型強弱、看懂分數失真，並選出最適合商務採購的測試。\u003C\u002Fp>\u003Cp>LLM 的分數看起來很明確，但到了 2026 年，只有部分測試還能反映真實表現。前沿模型在 GPQA Diamond 已到 94.3%，在 GSM8K 也逼近 99%，所以更重要的是：哪一個測試真的對應你的業務場景。\u003C\u002Fp>\u003Ctable>\u003Cthead>\u003Ctr>\u003Cth>項目\u003C\u002Fth>\u003Cth>測什麼\u003C\u002Fth>\u003Cth>目前訊號\u003C\u002Fth>\u003Cth>最適合\u003C\u002Fth>\u003C\u002Ftr>\u003C\u002Fthead>\u003Ctbody>\u003Ctr>\u003Ctd>MMLU\u003C\u002Ftd>\u003Ctd>57 個學科的廣泛知識\u003C\u002Ftd>\u003Ctd>頂尖分數 93%\u003C\u002Ftd>\u003Ctd>通用篩選、中階模型比較\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>GPQA Diamond\u003C\u002Ftd>\u003Ctd>博士級科學推理\u003C\u002Ftd>\u003Ctd>頂尖分數 94.3%\u003C\u002Ftd>\u003Ctd>高難推理、前沿模型比較\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>HumanEval\u003C\u002Ftd>\u003Ctd>Python 程式生成\u003C\u002Ftd>\u003Ctd>頂尖分數 93%\u003C\u002Ftd>\u003Ctd>快速 coding 檢查\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>SWE-bench Verified\u003C\u002Ftd>\u003Ctd>真實 GitHub 問題修復\u003C\u002Ftd>\u003Ctd>頂尖分數 80.8%\u003C\u002Ftd>\u003Ctd>軟體工程評估\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>LiveCodeBench\u003C\u002Ftd>\u003Ctd>抗污染 coding 測試\u003C\u002Ftd>\u003Ctd>頂尖分數 83.6%\u003C\u002Ftd>\u003Ctd>持續追蹤 coding 能力\u003C\u002Ftd>\u003C\u002Ftr>\u003C\u002Ftbody>\u003C\u002Ftable>\u003Ch2>1. MMLU\u003C\u002Fh2>\u003Cp>\u003Ca href=\"https:\u002F\u002Fwww.lxt.ai\u002Fblog\u002Fllm-benchmarks\u002F\">MMLU\u003C\u002Fa> 是這 5 個裡最廣的通用知識測試，涵蓋 57 個學科、超過 16,000 題\u003Ca href=\"\u002Fnews\u002Fwhy-halo-on-ps5-is-the-right-move-for-microsoft-zh\">選擇\u003C\u002Fa>題。當你想快速看一個模型能不能處理跨領域提示，這個分數仍然很有用。\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1779161051251-hgbf.png\" alt=\"5 個 LLM 基準測試\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>它的問題是開始飽和。前沿模型已推到 93%，所以它更適合區分弱模型與中階模型，不太適合拿來分辨最頂尖的幾個系統。\u003C\u002Fp>\u003Cul>\u003Cli>測量面向：知識與推理\u003C\u002Fli>\u003Cli>題型：選擇題\u003C\u002Fli>\u003Cli>適合用途：初步篩選\u003C\u002Fli>\u003Cli>不適合：最後的前沿排名\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>2. GPQA Diamond\u003C\u002Fh2>\u003Cp>\u003Ca href=\"https:\u002F\u002Fwww.lxt.ai\u002Fblog\u002Fllm-benchmarks\u002F\">GPQA Diamond\u003C\u002Fa> 適合你想測更硬的推理能力。它用生物、化學、物理等專家級題目，仍保留足夠難度來區分頂尖模型。\u003C\u002Fp>\u003Cp>截至 2026 年 2 月，\u003Ca href=\"\u002Ftag\u002Fgemini\">Gemini\u003C\u002Fa> 3.1 Pro 以 94.3% 領先，\u003Ca href=\"\u002Ftag\u002Fclaude\">Claude\u003C\u002Fa> Opus 4.6 為 91.3%，GPT-5.3 \u003Ca href=\"\u002Ftag\u002Fcodex\">Codex\u003C\u002Fa> 為 81%，Qwen3.5-plus 也接近 88.4%。這種差距表示它在頂端仍有辨識力。\u003C\u002Fp>\u003Cul>\u003Cli>測量面向：高階科學推理\u003C\u002Fli>\u003Cli>題型：博士級選擇題\u003C\u002Fli>\u003Cli>適合用途：前沿模型比較\u003C\u002Fli>\u003Cli>要注意：頂端仍可能逐步飽和\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>3. HumanEval\u003C\u002Fh2>\u003Cp>\u003Ca href=\"https:\u002F\u002Fwww.lxt.ai\u002Fblog\u002Fllm-benchmarks\u002F\">HumanEval\u003C\u002Fa> 仍是最容易理解的 coding 測試，因為它很直觀：164 個 Python 任務，全部靠單元測試驗證。如果你要做 demo、內部初選或快速檢查，這仍是好起點。\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1779161052017-q0f0.png\" alt=\"5 個 LLM 基準測試\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>但它已不算強力的前沿區分器。GPT-5.3 Codex 已到 93%，再加上污染問題存在，商務決策上應把它當第一關，而不是最後答案。\u003C\u002Fp>\u003Cul>\u003Cli>測量面向：程式生成\u003C\u002Fli>\u003Cli>語言：Python\u003C\u002Fli>\u003Cli>驗證方式：功能單元測試\u003C\u002Fli>\u003Cli>適合用途：快速基線檢查\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>4. SWE-bench Verified\u003C\u002Fh2>\u003Cp>\u003Ca href=\"https:\u002F\u002Fwww.lxt.ai\u002Fblog\u002Fllm-benchmarks\u002F\">SWE-bench Verified\u003C\u002Fa> 更接近真實軟體工作。它不是孤立題目，而是要求模型修補真實 \u003Ca href=\"\u002Ftag\u002Fgithub\">GitHub\u003C\u002Fa> issue，模型必須理解上下文、找出 bug，還要產出能通過測試的 patch。\u003C\u002Fp>\u003Cp>如果你關心開發者效率或 coding agent，這是最值得追的指標之一。Claude Opus 4.6 以 80.8% 領先，MiniMax-M2.5 為 80.2%，Gemini 3.1 Pro 為 80.6%，顯示頂尖系統之間競爭很接近。\u003C\u002Fp>\u003Cul>\u003Cli>測量面向：端到端軟體工程\u003C\u002Fli>\u003Cli>任務類型：真實 repository issue\u003C\u002Fli>\u003Cli>適合用途：agentic coding 評估\u003C\u002Fli>\u003Cli>優勢：比合成題更難作弊\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>5. LiveCodeBench\u003C\u002Fh2>\u003Cp>\u003Ca href=\"https:\u002F\u002Fwww.lxt.ai\u002Fblog\u002Fllm-benchmarks\u002F\">LiveCodeBench\u003C\u002Fa> 適合想要「分數還跟得上現況」的團隊。\u003Ca href=\"\u002Fnews\u002Fwhy-halo-on-ps5-is-the-right-move-zh\">它會\u003C\u002Fa>定期更新題庫，降低訓練資料污染，也讓測試能隨著模型進步持續保持價值。\u003C\u002Fp>\u003Cp>這對追蹤版本更新很重要。Qwen3.5-plus 在第 6 版以 83.6% 領先，而這個數字之所以更有意義，就是因為題庫會變動，較不容易被背題\u003Ca href=\"\u002Fnews\u002F5-claudes-credit-caps-impact-zh\">影響\u003C\u002Fa>。\u003C\u002Fp>\u003Ccode>LiveCodeBench 適合：1) 需要抗記憶化的 coding 測試，2) 想按月追蹤分數，3) 想看更貼近當前模型行為的比較。\u003C\u002Fcode>\u003Ch2>怎麼挑\u003C\u002Fh2>\u003Cp>如果你要先做廣泛篩選，從 MMLU 開始。若工作重點是高難推理，GPQA Diamond 更有訊號。對軟體團隊來說，HumanEval 可當快速檢查，但真要看實際 coding 能力，\u003Ca href=\"\u002Ftag\u002Fswe-bench-verified\">SWE-bench Verified\u003C\u002Fa> 和 LiveCodeBench 更可靠。\u003C\u002Fp>\u003Cp>最重要的原則很簡單：讓基準測試對應你的工作。只有當題目接近生產任務、資料夠乾淨，而且測試本身還有足夠區分度時，高分才真的有意義。\u003C\u002Fp>","5 個基準測試幫你判斷模型強弱、看懂分數失真，並選出最適合商務採購的測試。","www.lxt.ai","https:\u002F\u002Fwww.lxt.ai\u002Fblog\u002Fllm-benchmarks\u002F",null,"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1779161051251-hgbf.png",[13,14,15,16,17,18,19,20],"LLM benchmarks","MMLU","GPQA Diamond","HumanEval","SWE-bench Verified","LiveCodeBench","business buyers","2026","zh",0,false,"2026-05-19T03:23:38.737225+00:00","2026-05-19T03:23:38.64+00:00","done","56fc4207-d189-406c-9eee-2c3aba77e4f2","5-llm-benchmarks-for-business-buyers-2026-zh","industry","9b2db204-7090-4a48-85e0-65693e66152e","published",[33,34,35],"MMLU 適合通用篩選，不適合判定最頂尖模型。","GPQA Diamond 仍能有效區分高階推理能力。","SWE-bench Verified 和 LiveCodeBench 更接近真實 coding 工作。","7aa69b8b-ff49-4d68-9e8b-f08e577b1239","[-0.02747452,0.0104793925,0.02882822,-0.06920866,-0.021058427,-0.013923802,0.009717667,0.0044805473,0.003981841,0.00031343216,0.0055106455,-0.012150001,0.030259972,-0.026652738,0.117082596,0.03398447,-0.011157367,0.023710152,0.017211303,-0.019441905,0.012910479,0.013109831,-0.014134417,0.003556293,0.007952139,0.010939568,0.007710883,0.020182015,0.04490117,0.00081627147,-0.013176437,0.006289657,0.0070519536,0.031152392,-0.015613621,0.021628883,0.016534112,-0.005094198,0.0154435225,0.005812527,-0.0054833367,-0.023495339,0.010754932,-0.043776006,-0.020252949,0.014690765,-0.004449144,-0.02434692,-0.029251684,-0.00064700656,-0.01206405,0.029395912,0.0050824243,-0.15307482,-0.025648667,-0.0020283519,-0.005067115,-0.011095167,0.01235517,0.008425781,-0.033406626,0.019389566,-0.04104151,-0.010231088,-0.0024968477,-0.034220867,0.022253973,-0.01650408,-0.011157916,-0.017037129,-0.028946986,0.006431256,0.016583882,-0.03183074,-0.0051129824,-0.0054296423,-0.010171225,0.01035493,0.0012946747,0.00016345596,0.020196708,-0.025780957,0.009398001,0.002185904,0.0020355142,-5.62515e-05,0.013455992,0.0031801115,0.013325358,0.007248366,0.0120153455,0.0056293537,-0.0006449431,-0.006366955,-0.0019645926,-0.014693538,-0.039451133,-0.012487885,-0.021239484,-0.017106717,0.0065653445,0.0062175808,0.010993659,-0.00029194832,0.016627872,-0.0020030432,0.0046435692,0.0017694788,-0.011537939,-0.014755597,0.023989376,-0.016111096,-0.007445954,0.020514624,0.0019660257,-0.1390297,-0.0011931862,0.012991171,0.0050585656,0.0013894375,-0.0026949241,-0.0009380653,0.012399502,0.024824955,-0.02082167,-0.0051073846,0.010112155,-0.02105571,-0.013393985,0.0013666092,-0.04648398,0.0031770025,0.013337267,-0.02929204,-0.016275505,0.03195446,0.0037926936,-0.0039875447,-0.016806656,-0.035633616,0.016883554,0.01677038,0.013330022,-0.0128622735,-0.020762894,-0.019104887,-0.028346382,-0.01304125,-0.0076376763,-0.0037547988,0.027420318,-0.015112678,-0.018753668,0.0011816993,0.0414005,-0.0008994583,0.033642054,0.029025422,0.008367901,0.033764217,-0.011714023,0.001033081,-0.009898016,-0.002923637,-0.0022341954,0.021975666,-0.027253885,0.0075539066,0.0154542085,0.0024625948,0.0136902,-0.015375072,0.004572124,-0.011943945,-0.0098324185,-0.013929531,0.0067657866,0.008972932,0.0065059634,0.010528448,0.020899983,0.023576273,-0.018392704,0.0046407566,-0.01413174,0.0031796216,2.7986092e-05,0.01040862,0.012326161,0.034125835,-0.022851983,0.0129090175,0.03347924,-0.041111946,-0.028779946,0.0039717103,-0.010571636,-0.0018231026,-0.022254394,0.02258447,0.015338419,-0.0075860703,0.01102724,-0.013903643,-0.0061972085,-0.022031302,-0.0034404928,-0.007887162,0.0017197871,-0.0075791623,-0.0112788305,0.021871353,-0.005489132,-0.008670624,-0.021872573,-0.004411936,-0.013219474,-0.011763371,0.014400934,-0.02728122,-0.007652733,-0.017236274,-0.00031752363,0.025983913,-0.025617993,-0.02499627,0.014600666,-0.020559818,0.007838587,0.012860009,0.0031253393,0.03611978,0.018648105,0.0049245255,0.03566205,-0.000107665306,-0.016687088,0.025582155,0.017958779,0.017306024,-0.03102303,0.010700906,-0.0024870173,0.018891687,0.029958645,-0.0090827895,0.008080459,0.010055714,-0.0019834195,0.0045314906,0.006705059,0.01290524,-0.004282163,-0.016683724,0.022846835,-0.013339998,-0.018291831,0.023815865,-0.003032838,0.007357412,0.02285534,0.019846454,0.0012944855,-0.014571849,0.012839823,0.017653909,0.004391407,-0.03329871,-0.00043659616,0.025455056,-0.02551417,0.0008049771,0.016526837,-0.017955562,0.0061299475,-0.004146854,-0.044786915,0.0022248342,0.014871002,0.025708852,0.018026546,-0.023061642,-0.0026416923,0.0112718735,-0.0028198403,0.004386933,-0.01711293,-0.007918107,-0.0068530436,-0.00139303,0.012652744,0.009538605,-0.011055698,-0.00013947066,-0.015848568,-0.005061178,0.0051023723,0.015871514,-0.031991076,8.993597e-06,-0.013718843,0.0034930448,-0.003840561,0.06859521,-0.0015203569,-0.0015242766,-0.010686198,-0.0077882507,-0.009045794,-0.012016413,-0.0010888398,0.0025367565,0.032649785,-0.014276745,-0.03833027,-0.01656947,0.0124859,7.8575074e-05,-0.016955025,-0.012760118,-0.0032306777,-0.01171469,-0.01969975,0.005872296,-0.03886508,0.0037640773,0.0019616624,0.004820657,0.026054984,-0.016294818,-0.00047944352,0.01737575,0.015152084,-0.0030815082,0.0045293514,-0.012810897,-0.0045174337,-0.013063786,-0.021617299,-0.0018580373,-0.021618573,-0.014475218,-0.049851198,0.03162293,-0.023020502,0.019166838,-0.0068866247,0.025107188,-0.026271682,-0.028468693,0.031741004,-0.031091694,-0.019991957,-0.021580849,-0.01923443,0.0151924305,-0.018323768,-0.016867435,0.030849785,0.025018873,-0.005878621,-0.017526725,-0.018184748,0.0012379213,0.01487978,-0.029274907,0.014342529,-0.011329445,-0.022118386,0.012203673,0.0046578203,0.004243956,-0.0056530256,-0.0011635957,-0.018379778,0.020378299,-0.020936087,0.008609237,-0.0001504926,-0.01672318,0.0017660523,0.021050455,-0.0077817407,-0.0033018342,-0.02246064,-0.024442144,0.038841676,0.013621998,0.012626638,-0.008959499,0.011173772,0.013138998,0.013675035,0.026022065,0.0067808223,0.009494806,-0.024172498,0.015494181,0.0076090666,-0.00069139135,0.013601457,0.01947349,0.009914805,-0.0019716762,-0.01735096,0.018466964,-0.018339273,0.026128085,0.020399716,0.006922944,-0.0036632223,0.014245875,-0.0095558455,0.02619281,0.020462276,0.00800092,0.002328849,-0.0011405412,0.0010179651,-0.010727573,-0.01900117,0.030444566,0.013331018,-0.009476971,-0.013352796,-0.01191383,-0.027770959,-0.005943006,0.028528405,-0.03449771,0.0017693598,-0.008542746,-0.016457295,-0.032145184,-0.043213356,-0.018183976,-0.0334025,-0.014499733,-0.03470071,-0.013104234,0.00093535427,0.019292194,0.0035131532,-0.0049260305,0.03837471,-0.0044404687,0.0018227699,-0.0062301178,-0.028611345,0.0037183166,0.029380303,0.017099021,0.03588899,0.008528136,-0.012126668,-0.009227167,-0.014383034,-0.009938452,-0.01244718,-0.024595793,0.036252864,-0.014994712,0.012486839,0.031982724,-0.0088198725,-0.01823858,-0.01421256,-0.0062705483,-0.0130159315,0.018638963,-0.013354814,0.007328367,0.015661994,0.029176297,-0.009872408,-0.0038632152,-0.0014397172,0.014490985,-0.009530937,-0.001956984,-0.011264803,-0.038399126,0.0043380135,-0.016454143,-0.005230641,-0.013377017,-0.004206491,0.0014742584,-0.022773953,0.0016771659,0.01854264,0.028195767,0.02594492,0.019342383,-0.014131359,-0.011415351,-0.015619449,-0.009127823,-0.016750926,0.0033977062,0.0009129104,-0.0016719174,0.0040395856,-0.0071096467,-0.026541686,-0.011109214,0.0009108587,0.009993867,-0.0034007656,-0.001353493,-0.021448847,-0.0021103634,0.046605147,0.011551297,-0.0073788166,0.004410129,0.013936082,0.018920198,-0.01759257,-0.010926903,-0.023741538,-0.014689439,-0.037161674,0.025318906,0.0011314058,-0.010038363,0.014496576,-0.015534341,0.0030520298,0.043517716,0.0055873794,0.018104356,-0.003019918,-0.011253887,0.0010922423,-0.0073681674,0.038763087,0.03876577,0.007861603,0.019762976,-0.025233611,-0.014566106,-0.014661127,-0.022660594,0.033928666,-0.0890385,0.019235682,0.0022716047,0.0040811338,-0.008014714,-0.021165155,-0.017601995,-0.009739619,0.005377059,0.0014545623,0.009752823,0.024904065,0.015975164,0.016003883,-0.0011292809,-0.016545972,-0.02811881,-0.0034750605,0.017461589,-0.0016439641,0.015214712,-0.00011146097,0.009230982,0.019439086,-0.006350649,-0.007393298,0.006213372,0.0049796365,-0.017429614,0.013721327,-0.030945882,0.008651584,-0.0016415317,0.013693147,-0.0076470785,0.00087693916,0.03183713,0.010794997,0.015627677,-0.00030027895,-0.009923781,0.013697205,-0.03143368,-0.044602804,-0.009382522,-0.016275216,-0.016605068,0.015331028,0.015627557,0.02159127,-0.03291589,-0.026939817,0.009889381,-0.023331536,0.0044379034,-0.01612909,-0.029199088,0.012545107,-0.023239028,-0.0068512186,-0.020245783,0.0013827002,-0.0033671756,0.02528674,-0.007932884,0.0007195715,-0.0071527227,0.024783771,-0.005797618,-0.0012507541,-0.017229576,-0.032370087,0.017397907,0.011535059,-0.029390715,-0.012211635,0.016797824,0.016301075,-0.009487367,0.0066227536,-0.02241896,-0.024308125,-0.08423999,-0.0016180531,-0.018560456,0.01089772,0.014019793,-0.019226842,0.0153994365,-0.030572232,0.0048518395,-0.023535322,0.034756072,-0.008025908,0.00875189,-0.029050035,0.009211398,-0.010988936,-0.0006247948,0.024464618,0.010861015,-0.024037922,-0.0071568512,-0.02830066,0.022034429,-0.029833755,-0.044470605,-0.0015890982,0.014471078,0.010654405,-0.0007629196,-0.003032865,0.0011069712,-0.14251636,-0.013390741,-0.01587162,-0.001083426,0.0104023535,-0.00035480733,0.0060961563,0.0051830704,0.005846913,0.003373013,-0.0052039055,-0.039123397,-0.011012765,0.014141119,-0.0054848813,0.114462666,-0.004956389,0.014395605,-0.031108659,-0.0156595,0.00867981,-0.035243385,-0.038479112,-0.0011562258,0.028435372,0.0107986145,0.024446597,-0.0033380326,-0.0031096279,0.0077557,0.034617335,0.001593373,-0.008258296,-0.012368709,0.03034705,-0.013671033,-0.019200457,-0.01603238,-0.02521716,-0.011637588,0.019345619,0.013613386,0.011535823,-0.012770978,-0.021916227,-0.00086726097,-0.025541078,-0.03291727,0.006836141,0.018299215,-0.014915088,-0.050463952,-0.008020771,-0.004805754,0.0019480677,0.012813994,-0.02476572,-0.013442392,0.016240835,-0.011985483,0.020635441,-0.02005297,0.008679752,0.031840667,-0.03414355,0.00019013612,0.016309349,0.012803343,0.008024107,0.013680336,-0.00024492253,0.012146565,-0.0050319433,-0.017063085,0.0063776076,-0.0070898314,0.012229575,0.023995148,0.021877095,-0.009637163,0.017260803,0.019947907,0.0072614183,-0.02515994,0.008810616,0.0044537615,0.010178742,-0.0012198724,0.0077464664,-0.022653509,0.01667337,0.017365417,-0.01284537,-0.015589165,0.00068522297,0.018126162,0.010530452,0.037878297,0.0051813857,-0.01713567,-0.013830271,-1.8814859e-05,0.023165433,-0.000331819,0.024131078,0.006543708,-0.0071260394,0.024892164,0.020916685,-0.016360404]",{"tags":39,"relatedLang":50,"relatedPosts":54},[40,42,44,46,48],{"name":14,"slug":41},"mmlu",{"name":15,"slug":43},"gpqa-diamond",{"name":17,"slug":45},"swe-bench-verified",{"name":16,"slug":47},"humaneval",{"name":13,"slug":49},"llm-benchmarks",{"id":30,"slug":51,"title":52,"language":53},"5-llm-benchmarks-for-business-buyers-2026-en","5 LLM benchmarks for business buyers in 2026","en",[55,61,67,73,79,85],{"id":56,"slug":57,"title":58,"cover_image":59,"image_url":59,"created_at":60,"category":29},"be0785a5-7976-4735-8f46-6abd84dac9af","5-shifts-in-llms-from-the-last-six-months-zh","5 個 LLM 的半年轉變","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1779167646556-uxgd.png","2026-05-19T05:13:34.442673+00:00",{"id":62,"slug":63,"title":64,"cover_image":65,"image_url":65,"created_at":66,"category":29},"9490f35a-38a9-4006-a1cc-00f8da10f80a","fever-monique-billings-early-2026-impact-zh","Billings 2026 首秀，Fever 補到深度","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1779164034029-nxli.png","2026-05-19T04:13:26.978775+00:00",{"id":68,"slug":69,"title":70,"cover_image":71,"image_url":71,"created_at":72,"category":29},"766421c5-7b6e-46c4-b56f-120ba1819afa","5-indiana-fever-updates-zh","5 個 Indiana Fever 最新重點","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1779163534728-b4rh.png","2026-05-19T04:05:01.275433+00:00",{"id":74,"slug":75,"title":76,"cover_image":77,"image_url":77,"created_at":78,"category":29},"fe912f7a-3393-4ca1-af0e-71c6605e8565","why-claudes-announcement-cadence-is-the-real-product-zh","為什麼 Claude 的公告節奏才是真正的產品","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1779160426720-zxld.png","2026-05-19T03:13:19.642996+00:00",{"id":80,"slug":81,"title":82,"cover_image":83,"image_url":83,"created_at":84,"category":29},"e4c406a3-9ee7-45be-81b9-6b88c393b6e2","5-claudes-credit-caps-impact-zh","5 個 Claude 信用額上限影響","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1779159829967-hzbk.png","2026-05-19T03:03:24.504579+00:00",{"id":86,"slug":87,"title":88,"cover_image":89,"image_url":89,"created_at":90,"category":29},"af797b91-bcd6-4320-85cf-632538a6c538","why-go-release-policy-beats-lts-zh","為什麼 Go 的發布政策比 LTS 更好","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1779156230756-ap80.png","2026-05-19T02:03:22.118845+00:00",[92,97,102,107,112,117,122,127,132,137],{"id":93,"slug":94,"title":95,"created_at":96},"ee073da7-28b3-4752-a319-5a501459fb87","ai-in-2026-what-actually-matters-now-zh","2026 AI 真正重要的事","2026-03-26T07:09:12.008134+00:00",{"id":98,"slug":99,"title":100,"created_at":101},"83bd1795-8548-44c9-9a7e-de50a0923f71","trump-ai-framework-power-speech-state-preemption-zh","川普 AI 框架瞄準電力、言論與州權","2026-03-26T07:12:18.695466+00:00",{"id":103,"slug":104,"title":105,"created_at":106},"ea6be18b-c903-4e54-97b7-5f7447a612e0","nvidia-gtc-2026-big-ai-announcements-zh","NVIDIA GTC 2026 重點拆解","2026-03-26T07:14:26.62638+00:00",{"id":108,"slug":109,"title":110,"created_at":111},"4bcec76f-4c36-4daa-909f-54cd702f7c93","claude-users-spreading-out-and-getting-better-zh","Claude 用戶更分散，也更會用","2026-03-26T07:22:52.325888+00:00",{"id":113,"slug":114,"title":115,"created_at":116},"bd903b15-2473-4178-9789-b7557816e535","openclaw-raises-hard-question-for-ai-models-zh","OpenClaw 逼問 AI 模型價值","2026-03-26T07:24:54.707486+00:00",{"id":118,"slug":119,"title":120,"created_at":121},"eeac6b9e-ad9d-4831-8eec-8bba3f9bca6a","gap-google-gemini-checkout-fashion-search-zh","Gap 把結帳搬進 Gemini","2026-03-26T07:28:23.937768+00:00",{"id":123,"slug":124,"title":125,"created_at":126},"0740e53f-605d-4d57-8601-c10beb126f3c","google-pushes-gemini-transition-to-march-2026-zh","Google 把 Gemini 轉換延到 2026 年 3…","2026-03-26T07:30:12.825269+00:00",{"id":128,"slug":129,"title":130,"created_at":131},"e660d801-2421-4529-8fa9-86b82b066990","metas-llama-4-benchmark-scandal-gets-worse-zh","Meta Llama 4 分數風波又擴大","2026-03-26T07:34:21.156421+00:00",{"id":133,"slug":134,"title":135,"created_at":136},"183f9e7c-e143-40bb-a6d5-67ba84a3a8bc","accenture-mistral-ai-sovereign-enterprise-deal-zh","Accenture 攜手 Mistral AI 賣主權 AI","2026-03-26T07:38:14.818906+00:00",{"id":138,"slug":139,"title":140,"created_at":141},"191d9b1b-768a-478c-978c-dd7431a38149","mistral-ai-faces-its-hardest-year-yet-zh","Mistral AI 迎來最硬的一年","2026-03-26T07:40:23.716374+00:00"]