[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"article-how-to-reduce-ai-model-serving-friction-zh":3,"article-related-how-to-reduce-ai-model-serving-friction-zh":35,"series-industry-a4380666-3f3c-4465-be35-903068c7045e":88},{"id":4,"title":5,"content":6,"summary":7,"source":8,"source_url":9,"author":10,"image_url":11,"keywords":12,"language":18,"translated_content":10,"views":19,"is_premium":20,"created_at":21,"updated_at":21,"cover_image":11,"published_at":22,"rewrite_status":23,"rewrite_error":10,"rewritten_from_id":24,"slug":25,"category":26,"related_article_id":27,"status":28,"google_indexed_at":10,"x_posted_at":10,"tweet_text":10,"title_rewritten_at":10,"title_original":10,"key_takeaways":29,"topic_cluster_id":33,"embedding":34,"is_canonical_seed":20},"a4380666-3f3c-4465-be35-903068c7045e","怎麼降低 AI 模型部署摩擦","\u003Cp data-speakable=\"summary\">這篇教你把訓練好的 \u003Ca href=\"\u002Fnews\u002Fwhy-global-ai-regulation-2026-rewards-modular-compliance-zh\">AI\u003C\u002Fa> 模型穩定送進 production，透過 ONNX、TensorRT、動態輸入、版本鎖定與 Triton 檢查，建立可重複的部署流程。\u003C\u002Fp>\u003Cp>這篇給 ML 工程師、平台團隊與後端開發者看，目標是把已訓練模型從 notebook 送到 production 時常見的匯出失敗、runtime 不一致與延遲落差降到最低。\u003C\u002Fp>\u003Cp>照著做完，你會得到一條可重複的 serving 流程：先驗證匯出，再處理動態輸入形狀，接著鎖定相容版本，最後部署可量測的推論服務。\u003C\u002Fp>\u003Ch2>開始之前\u003C\u002Fh2>\u003Cul>\u003Cli>NVIDIA GPU 與 CUDA 相容驅動\u003C\u002Fli>\u003Cli>Python 3.10+\u003C\u002Fli>\u003Cli>Docker 24+\u003C\u002Fli>\u003Cli>PyTorch 2.2+\u003C\u002Fli>\u003Cli>ONNX 1.15+\u003C\u002Fli>\u003Cli>TensorRT 10+\u003C\u002Fli>\u003Cli>可讀取 \u003Ca href=\"https:\u002F\u002Fdocs.nvidia.com\u002Fdeeplearning\u002Ftensorrt\u002F\">TensorRT 文件\u003C\u002Fa> 與 \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FNVIDIA\u002FTensorRT\">TensorRT GitHub repo\u003C\u002Fa>\u003C\u002Fli>\u003Cli>可讀取 \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Ftriton-inference-server\u002Fserver\">Dynamo-Triton 文件\u003C\u002Fa> 與 \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Ftriton-inference-server\u002Fserver\">Triton Inference Server GitHub repo\u003C\u002Fa>\u003C\u002Fli>\u003Cli>選用但建議：NGC 帳號，用來拉預建容器\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>Step 1: 匯出乾淨的 ONNX 模型\u003C\u002Fh2>\u003Cp>目的：先產出可上線的圖，移除訓練專用行為，並在進入 serving 前把匯出問題攔下來。\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778922836413-ff99.png\" alt=\"怎麼降低 AI 模型部署摩擦\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cpre>\u003Ccode>python export.py \n  --model checkpoints\u002Fmodel.pt \n  --output model.onnx \n  --opset 17\u003C\u002Fcode>\u003C\u002Fpre>\u003Cp>把匯出流程同時放進本機與 CI，並在轉換前先做常數折疊、移除 dropout、teacher-forcing 分支或其他訓練專用路徑。如果匯出失敗，應回頭修正 source model 的不支援 op 或 tensor shape 假設，不要等到部署階段才補救。\u003C\u002Fp>\u003Cp>你應該看到一個有效的 ONNX 檔案，以及沒有 unsupported-operation 錯誤的匯出紀錄。\u003C\u002Fp>\u003Ch2>Step 2: 用 TensorRT 轉成推論引擎\u003C\u002Fh2>\u003Cp>目的：把 ONNX 圖轉成 \u003Ca href=\"\u002Ftag\u002Fgpu\">GPU\u003C\u002Fa> 最佳化 engine，讓 TensorRT 幫你融合 layer 並挑選較有效率的 kernel。\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778922831262-obry.png\" alt=\"怎麼降低 AI 模型部署摩擦\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cpre>\u003Ccode>trtexec \n  --onnx=model.onnx \n  --saveEngine=model.plan \n  --fp16\u003C\u002Fcode>\u003C\u002Fpre>\u003Cp>先用 TensorRT 驗證圖能否乾淨轉換，再比較 FP16 與 FP32，確認精度取捨符合你的工作負載。如果 TensorRT 回報不支援的 layer，就決定要改寫模型、替換運算，或補一個 plugin。\u003C\u002Fp>\u003Cp>你應該看到已儲存的 engine 檔，以及列出精度選擇與 layer 最佳化結果的 build summary。\u003C\u002Fp>\u003Ch2>Step 3: 為不支援的運算加入 plugin\u003C\u002Fh2>\u003Cp>目的：當 TensorRT 不能原生支援某個 layer 或自訂 operator 時，讓整條 pipeline 繼續往下走。\u003C\u002Fp>\u003Cpre>\u003Ccode>\u002F\u002F Custom TensorRT plugin skeleton\nclass MyPlugin : public nvinfer1::IPluginV2DynamicExt {\n  \u002F\u002F implement configurePlugin, enqueue, getOutputDimensions\n};\u003C\u002Fcode>\u003C\u002Fpre>\u003Cp>只\u003Ca href=\"\u002Fnews\u002Fwhy-fine-tuning-llms-domain-tasks-right-default-zh\">針對\u003C\u002Fa>標準 TensorRT layer 無法表達的運算，實作必要的 C++ 或 \u003Ca href=\"\u002Ftag\u002Fcuda\">CUDA\u003C\u002Fa> plugin。寫新 \u003Ca href=\"\u002Fnews\u002Frefdecoder-reference-conditioned-video-decoder-zh\">code\u003C\u002Fa> 之前，先搜尋 TensorRT plugin 生態與既有 sample，避免重複造輪子。把 plugin 介面維持得越窄越好，這樣才容易測試與版本化。\u003C\u002Fp>\u003Cp>你應該看到模型在連結 plugin 後成功 build，且 \u003Ca href=\"\u002Ftag\u002Finference\">inference\u003C\u002Fa> 回傳符合預期的 tensor shapes。\u003C\u002Fp>\u003Ch2>Step 4: 設定動態輸入 profile\u003C\u002Fh2>\u003Cp>目的：支援可變 batch size 或 sequence length，而不用為每一種請求型態重編 engine。\u003C\u002Fp>\u003Cpre>\u003Ccode>trtexec \n  --onnx=model.onnx \n  --minShapes=input:1x3x224x224 \n  --optShapes=input:8x3x224x224 \n  --maxShapes=input:32x3x224x224\u003C\u002Fcode>\u003C\u002Fpre>\u003Cp>建立符合真實流量的 optimization profiles，不要只抓最大 tensor。如果工作負載有明顯不同模式，例如互動式小請求與大批次作業，就建立多個 profile，讓 server 能選到較合適的配置。這通常能減少 padding 浪費，也避免昂貴的 engine rebuild。\u003C\u002Fp>\u003Cp>你應該看到同一個 engine 支援多種輸入大小，而且 \u003Ca href=\"\u002Ftag\u002Fbenchmark\">benchmark\u003C\u002Fa> 在 request dimension 改變時不再觸發重新編譯。\u003C\u002Fp>\u003Ch2>Step 5: 鎖定 runtime 版本並部署 Triton\u003C\u002Fh2>\u003Cp>目的：把模型放進一致的 inference 環境，消除版本漂移。\u003C\u002Fp>\u003Cpre>\u003Ccode>docker run --gpus all --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 \n  -v $PWD\u002Fmodel_repository:\u002Fmodels \n  nvcr.io\u002Fnvidia\u002Ftritonserver:latest-trtllm-python-py3\u003C\u002Fcode>\u003C\u002Fpre>\u003Cp>使用預建容器或鎖定的 image tag，讓 CUDA、TensorRT 與 server runtime 保持一致。在 model repository 中，明確定義 model version、backend 與 config。如果你需要 dynamic batching、concurrent versions 或 multi-GPU scaling，Triton 可以把這些控制集中在同一處。\u003C\u002Fp>\u003Cp>你應該看到 Triton 正常啟動，並且健康檢查與 inference endpoints 都能回應，沒有 library mismatch 警告。\u003C\u002Fp>\u003Ch2>Step 6: 量測吞吐與延遲\u003C\u002Fh2>\u003Cp>目的：確認部署達到 production 目標，並在 rollout 前找出下一個瓶頸。\u003C\u002Fp>\u003Cpre>\u003Ccode>trtexec \n  --loadEngine=model.plan \n  --warmUp=200 \n  --duration=60 \n  --streams=4\u003C\u002Fcode>\u003C\u002Fpre>\u003Cp>用 trtexec、Nsight Systems 或 Model Analyzer 量測 engine，檢查 batch size、concurrency 與 instance count。一次只調一個變數，才看得出某個變更是提升吞吐、拖慢延遲，還是只是把工作從 CPU 移到 GPU。把 baseline 與調整後的結果寫進部署紀錄。\u003C\u002Fp>\u003Cp>你應該看到穩定的延遲數字、更高的 GPU 使用率，以及 serving 設定的前後對照結果。\u003C\u002Fp>\u003Ctable>\u003Cthead>\u003Ctr>\u003Cth>指標\u003C\u002Fth>\u003Cth>基準／優化前\u003C\u002Fth>\u003Cth>結果／優化後\u003C\u002Fth>\u003C\u002Ftr>\u003C\u002Fthead>\u003Ctbody>\u003Ctr>\u003Ctd>模型匯出可靠性\u003C\u002Ftd>\u003Ctd>常在 ONNX 轉換時失敗\u003C\u002Ftd>\u003Ctd>CI 驗證匯出，減少部署驚喜\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>輸入處理\u003C\u002Ftd>\u003Ctd>形狀改變就要重新編譯\u003C\u002Ftd>\u003Ctd>動態 optimization profile 重用同一個 engine\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>runtime 一致性\u003C\u002Ftd>\u003Ctd>不同環境容易出現版本不符\u003C\u002Ftd>\u003Ctd>鎖定容器與依賴版本\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>服務效率\u003C\u002Ftd>\u003Ctd>batch 與 concurrency 未調整\u003C\u002Ftd>\u003Ctd>經量測的 Triton 部署與吞吐結果\u003C\u002Ftd>\u003C\u002Ftr>\u003C\u002Ftbody>\u003C\u002Ftable>\u003Ch2>常見錯誤\u003C\u002Fh2>\u003Cul>\u003Cli>把 export 驗證留到 release day 才做。修法：每次模型變更都在 CI 跑 ONNX export 與 TensorRT build 檢查。\u003C\u002Fli>\u003Cli>所有流量共用一個過大的 dynamic profile。修法：依真實流量分段建立 profile，例如互動式與批次作業分開。\u003C\u002Fli>\u003Cli>local、staging、production 混用不同 library 版本。修法：使用鎖定的 container image，並固定精確的 framework 與 runtime 版本。\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>接下來可以看什麼\u003C\u002Fh2>\u003Cp>當這條 pipeline 穩定後，可以再往 custom backend、multi-model routing 與 Model Analyzer 自動調參深入，讓不同團隊與工作負載都能共用同一套 serving 標準。\u003C\u002Fp>","這篇教你把訓練好的 AI 模型穩定送進 production，透過 ONNX、TensorRT、動態輸入、版本鎖定與 Triton 檢查，建立可重複的部署流程。","www.mexc.com","https:\u002F\u002Fwww.mexc.com\u002Fnews\u002F1085986",null,"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778922836413-ff99.png",[13,14,15,16,17],"ONNX","TensorRT","Triton Inference Server","Docker","PyTorch","zh",3,false,"2026-05-16T09:13:31.665292+00:00","2026-05-16T09:13:31.47+00:00","done","f306d034-7265-461c-8922-62f90b3c0101","how-to-reduce-ai-model-serving-friction-zh","industry","a75384ff-223f-4a34-9f86-ae5c2772a2d6","published",[30,31,32],"先在 CI 驗證模型匯出，再進入 serving 流程。","用 TensorRT 與動態 profile 降低效能與形狀變化帶來的摩擦。","鎖定容器與 runtime 版本，讓部署結果可重現。","caa87b65-9bbc-46fe-bba8-4f4158dd2d8b","[-0.004342745,0.017067665,0.01956624,-0.08538201,-0.0066944086,0.021039573,0.0010881445,0.002053795,0.024377277,0.0038568322,-0.020946054,-0.03425927,0.02178593,0.020970598,0.12616572,0.023166904,0.0015867832,-0.00423697,0.023685578,-0.02718734,0.023739913,0.017270593,-0.013961611,-0.016876461,0.002100261,-0.0011176724,0.016759805,0.034596123,0.05961182,-0.045562755,0.026595589,0.0114190765,0.0062877275,0.028820846,-0.0015743889,-0.008256747,0.004400058,-0.014750537,0.02974954,0.007667483,-0.027772421,0.012103356,0.009560682,-0.023516659,-0.008167047,-0.0030586412,0.0062520807,-0.030228062,-0.00043312865,0.0069836886,0.00132155,0.0020563842,0.007855782,-0.14955555,-0.01895382,0.0002062949,0.015998045,0.013447481,0.01507651,0.0073886355,0.0026718115,0.033930816,-0.017869443,-0.020587452,-0.012932044,-0.005702489,0.028913587,-0.0029837228,-0.0027687668,-0.022646647,0.0017308722,0.0027268429,-0.013273288,-0.02311258,0.019125622,-0.042938415,0.005583583,-0.0058025997,-0.01343989,0.02487336,-0.004334129,-0.033005126,0.03187029,-0.012737278,-0.005663623,0.00952675,0.026491025,0.010554528,0.009361516,0.02438317,0.034193996,0.018888364,-0.016875584,0.0060349163,0.0534839,-0.01368324,-0.0032114496,-0.023967532,0.02129558,-0.011961794,-0.008878984,-0.045997854,0.009008678,0.017851256,-0.006865896,-0.0034703503,0.011442301,-0.017415056,0.010710269,-0.01397794,0.017127069,-0.01798393,0.010937302,0.012692671,0.010359106,-0.12165913,0.0010867331,0.009189271,0.013851346,0.0046772393,-0.01088852,-0.0001863565,0.003752285,0.029502964,-0.0030728693,-0.0007953107,-0.009717609,0.016689302,-0.028581629,-0.033797078,-0.011615588,-0.017151149,-0.017474169,0.026458412,0.010721406,0.013396051,-0.02324003,0.006832307,-0.025917573,-0.02088015,-0.0068745143,0.046795715,-0.019553855,-0.020818165,-0.01486773,-0.015287949,-0.03359188,0.013759513,0.004304559,-0.028073134,0.021855291,-0.01546468,-0.0024642542,-0.01700274,0.027447568,-0.0032582853,0.0071726292,-0.009514253,-0.005759692,0.0069990517,-0.00041077455,-0.016282486,-0.0032544758,-0.001698783,-0.0076639713,0.0064580394,0.01109642,0.022714254,0.0056464095,0.016917216,0.016855206,-0.016519433,-0.023172252,0.042079773,-0.024044542,0.0009269758,-0.016781967,-0.010475933,0.015346054,-0.020437054,-0.004914899,0.0066881697,-0.0010781987,0.007367229,0.009458112,-0.000921237,0.0009721195,0.0076055406,0.019069295,0.018283844,-0.010663026,0.005490769,0.015722746,-0.005295018,-0.004384063,-0.018608574,0.0027753843,-0.0032178957,-0.004571737,0.007343679,0.0009049215,-0.0178437,0.011766675,-0.024221443,0.0024458526,-0.01969885,-0.0039758887,-0.018580785,-0.0021485842,-0.03385598,-0.019815523,-0.022290071,0.02769856,0.023805808,-0.0073814383,-0.0076806988,-0.014632276,-0.0020450365,0.0016565641,0.003735402,0.023994036,0.00051393843,-0.004304477,0.027150653,-0.0022227184,-0.02239098,0.011540794,-0.00169361,-0.035309568,-0.0055924333,0.029231451,0.00787671,0.022256088,0.04177974,0.023038857,-0.0005317576,-0.021353798,0.014509565,0.00963914,0.043232385,-0.008544738,0.03846215,-0.00963564,-0.009254898,0.022522453,0.0080421055,0.004905632,-0.023375915,0.0030787366,0.008368789,-0.026441013,0.0065386044,-0.0023788612,-0.007159278,0.018921452,-0.013315091,-0.011431634,0.015785852,-0.011562157,0.048373524,-0.009040175,0.0128255915,0.0067939335,-0.01579809,0.025195872,0.007568905,-0.0024182184,0.0150868455,-0.03745164,0.011508086,0.02364807,0.02530067,0.01107605,-0.0073437993,0.03147531,-0.0047223256,-0.024181291,0.05153582,0.0052097235,-0.016955256,-0.015782936,0.017502341,0.017418839,0.00546339,0.008385077,0.021160796,-0.015537746,-0.024035588,-0.009988845,-0.0049078995,0.010140209,0.033768784,-0.015006112,-0.027813798,0.026523296,-0.02420105,-0.021238685,0.021050112,-0.002878097,0.02767321,-0.0031715182,-0.0057296874,-0.014039082,0.058265723,-0.010507907,0.0077629704,0.015672805,0.02546315,-0.0004066035,-0.0016371494,0.004495267,-0.009963518,0.01571963,-0.0038842065,0.005877409,-0.015017495,0.014573084,-0.018891243,0.010239697,0.008568127,-0.017053366,-0.024506468,-0.020642549,0.019237649,-0.012956388,-0.013061887,-0.013766992,0.009910861,0.01725015,-0.023877786,0.005137913,0.02796496,-0.0038126293,-0.009506918,-0.011131631,-0.0010702399,0.012314284,-0.00018480104,-0.00845985,0.018924465,0.0005670081,-0.013880456,-0.032532666,0.008314858,-0.050349932,0.0151710585,0.0028790254,0.008436422,-0.0058756787,-0.033814337,0.02032788,0.011092026,0.002832812,0.010319585,-0.020216363,-0.0003900883,-0.01284355,0.0014931391,0.03714465,0.000303184,0.0021113353,-0.010486971,-0.014361564,-0.017712355,0.023778018,-0.024654174,-0.0032051264,0.017343692,0.0049051787,-0.024744153,-0.0061427015,0.003801345,0.029331818,-0.0026000917,0.0046633314,-0.00228956,-0.010089637,-0.006417647,-0.00064617465,-0.019007618,-0.0070686,0.037064597,-0.022927709,-0.0023559337,-0.01116094,0.004191106,0.029862685,0.003197484,0.0047015254,0.0019382294,-0.012098561,0.0014080462,0.005414255,0.017515084,-0.009786757,0.0058031115,-0.010613491,0.0030353519,-0.004050576,-0.010193763,0.006927861,-0.0124335,0.0072679827,-0.0034251693,0.005194404,-0.00500077,-0.014137947,0.024898976,-0.01447491,-0.014055452,-0.0096424995,0.0024835526,-0.003386533,0.010357142,0.04184859,0.004201637,-0.0034598303,-0.0030678948,0.007965296,-0.023459747,-0.034032755,0.021657115,-0.0068970094,0.0016554053,0.002163579,-0.00032195344,-0.014757404,0.007096295,-0.0062501403,-0.02941167,0.004884244,-0.003325374,0.0045676487,-0.021417461,-0.0103430785,-0.02407433,-0.016344227,0.031269804,-0.0069002304,-0.038815234,0.021170612,0.012320368,0.002546537,-0.017781459,0.015878746,-0.001837952,0.01571298,-0.008371259,-0.030896885,0.023223918,0.054484338,0.01589332,0.045598578,-0.0004921688,-0.010006236,-0.017047156,-0.023093188,-0.024783071,-0.0006145483,-0.005196319,0.0041341954,0.0033574498,0.019707363,-0.0074619954,-0.009501468,0.011307225,-0.0166161,-0.018679475,0.004206214,0.0233834,0.0025827847,0.009418461,0.021071145,-0.0057180906,-0.011167881,0.01470233,-0.011669982,0.025311364,0.01858662,0.010378271,0.017644117,-0.020492056,0.025213318,-0.007905146,0.005910486,0.0044710236,-0.012565063,-0.004218286,-0.019699695,-0.006726012,0.008893993,0.027329419,0.002971423,0.005399895,-0.023303654,0.0260224,-0.0026337262,0.012490778,0.016414553,-0.01434568,-0.014954468,0.0069830487,0.021652292,-0.015180744,-0.0033109302,-0.028052213,0.0007969521,-0.020934505,0.0093241045,-0.0074379435,-0.017011646,-0.0061902334,0.0062830467,0.021999234,-0.017218065,0.00095893815,-0.0024254816,0.0053695026,-0.022933891,0.004428127,-0.0034237108,-0.033488687,-0.0028662796,0.03028636,-0.014426977,-0.020037564,-0.008797364,0.030076986,0.005445803,0.034713045,-0.0079690805,0.014086551,-0.012360857,-0.029729541,0.016464919,0.019869046,0.005247732,0.020801125,-0.008025239,0.00034541776,-0.0049648588,-0.027562818,0.0017494613,-0.009551994,0.023173194,-0.09733474,-0.002157739,-0.005805478,0.0149114,0.02609268,-0.005109123,-0.0039336737,-0.0030946126,0.0061769336,-0.005747581,0.00014945342,0.009613529,0.032975912,-0.0023047833,-0.032274857,0.013403706,-0.0052596126,-0.013413778,0.036829077,-0.015245355,0.020822281,-0.015068981,-0.002231393,0.014746686,-0.0018712819,0.011910663,0.014239926,0.033463944,-0.014208765,-0.004680669,-0.004726087,-0.039075017,0.0051334826,-0.010818567,0.010027152,0.002428969,-0.008299381,0.004149933,0.031923316,0.0020241241,0.01678801,0.015128144,-0.036747023,-0.016460458,0.026695728,0.012217306,0.0019584997,-0.0029820986,-0.0065248474,0.011207783,-0.0133574335,-0.011285625,-0.030523725,-0.042155016,-0.0074171545,0.0012394682,-0.012088705,0.0089282,-0.0051704836,0.022946185,-0.018789977,-0.014981369,-0.02478506,0.019746754,-0.01814927,-0.0011652868,0.0059058485,0.019980403,-0.0051611904,0.019674653,0.0037029132,-0.011718412,-0.0021348821,0.0005828619,-0.004922939,0.025359029,-0.007704198,0.009262424,-0.024423134,-0.0003125762,-0.0108862,-0.029744092,-0.073124245,-0.017227221,0.007559398,-0.006564926,0.037958663,0.0014729238,0.0069642,-0.0026502255,0.008336745,0.0082148025,0.013710127,-0.014723343,0.008873632,-0.029881261,-0.020466067,-0.0077170758,0.007918866,-0.02199537,-0.014069954,-0.018012771,0.021860916,-0.012869501,0.021445401,-0.0005497305,-0.015903154,0.0059386124,0.015073844,0.012556159,0.020822069,-0.0031058236,-0.009199432,-0.12801373,0.01253548,-0.0024304367,-0.014270172,0.0044140373,0.008603268,-0.013942572,-0.019442607,-0.014727569,-0.039862677,-9.6690346e-05,-0.025904985,-0.0100035835,-0.009515048,-0.0048215007,0.13639094,-0.022791397,-0.017636294,-0.027363596,-0.027154671,-0.010410306,-0.02167235,-0.02089441,0.032797832,0.018498458,-0.014251152,0.021634389,-0.0029481307,-0.022406299,0.010425683,0.005811683,0.012653203,-0.01301118,-0.0009891008,0.02468294,0.0022419204,-0.0195857,-0.01852985,0.013647167,-0.00043537386,-0.0054779155,0.016940579,0.00034699484,0.00034305197,0.012603149,0.009402158,-0.0049802526,-0.018751811,0.002915301,-0.017517172,-0.015621678,-0.06076263,-0.0071004913,-0.037292354,0.0051831827,-0.008771928,0.0008381303,-0.0054879882,0.023248913,-0.016920459,0.02389092,0.019342402,-0.00542725,0.009413023,-0.0024448764,-0.0124569135,0.030351972,0.014687947,0.024080042,0.0011577477,0.01592679,6.228007e-05,-0.02336604,0.008632135,-0.0045687854,-0.007245269,0.0024760978,0.015639702,0.030047406,0.0064919814,-0.021961711,-0.009987075,-0.0018266647,-0.03349861,0.009757949,0.01278783,0.019272843,0.03281929,-0.03688933,-0.004794464,0.0017763784,0.032353126,0.027261583,0.010305793,-0.0014994764,0.016572772,-0.008315066,0.03577923,0.016535202,-0.011966531,-0.00056882255,-0.015616014,0.0013915037,0.019858463,0.028808365,0.0014893564,0.004792141,0.03306476,-0.0062068864,0.012525123]",{"tags":36,"relatedLang":47,"relatedPosts":51},[37,39,41,43,45],{"name":15,"slug":38},"triton-inference-server",{"name":14,"slug":40},"tensorrt",{"name":17,"slug":42},"pytorch",{"name":13,"slug":44},"onnx",{"name":16,"slug":46},"docker",{"id":27,"slug":48,"title":49,"language":50},"how-to-reduce-ai-model-serving-friction-en","How to Reduce AI Model Serving Friction","en",[52,58,64,70,76,82],{"id":53,"slug":54,"title":55,"cover_image":56,"image_url":56,"created_at":57,"category":26},"491c49cd-6b0b-4c4a-8120-402254ec0f4a","how-to-follow-gemini-and-apple-watch-12-rumors-zh","怎麼追 Gemini 與 Apple Watch 12 傳聞","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778933028697-qnhw.png","2026-05-16T12:03:23.685907+00:00",{"id":59,"slug":60,"title":61,"cover_image":62,"image_url":62,"created_at":63,"category":26},"92424d3d-23ac-4ae5-bedf-08db6a01eb9a","jensen-huang-trump-china-trip-zh","黃仁勳搭上川普專機赴中","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778930030195-daad.png","2026-05-16T11:13:26.928711+00:00",{"id":65,"slug":66,"title":67,"cover_image":68,"image_url":68,"created_at":69,"category":26},"cde2a775-0898-485e-9b0e-38c4288501b8","chatgpt-vs-gemini-9-tests-1-clear-winner-2026-zh","ChatGPT vs Gemini：9 項測試，誰更值得選","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778925827606-i3zy.png","2026-05-16T10:03:29.803046+00:00",{"id":71,"slug":72,"title":73,"cover_image":74,"image_url":74,"created_at":75,"category":26},"bfbcb15a-47ab-478e-822a-38d89dc8cb84","lora-vs-qlora-vs-full-fine-tuning-zh","LoRA vs QLoRA vs 全量微調","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778915627798-evv7.png","2026-05-16T07:13:32.474543+00:00",{"id":77,"slug":78,"title":79,"cover_image":80,"image_url":80,"created_at":81,"category":26},"3c8fd898-40aa-4f98-b0d1-178e7b4d1c69","why-global-ai-regulation-2026-rewards-modular-compliance-zh","為什麼 2026 全球 AI 監管獎勵模組化合規","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778913216545-oxy8.png","2026-05-16T06:33:19.724845+00:00",{"id":83,"slug":84,"title":85,"cover_image":86,"image_url":86,"created_at":87,"category":26},"768916ff-4d12-44c8-bdb0-8d7ff8dd786f","lovable-backs-atech-vibe-coding-hardware-zh","Lovable 投資 Atech，硬體也想 vibe coding","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778905244885-h18k.png","2026-05-16T04:20:29.20636+00:00",[89,94,99,104,109,114,119,124,129,134],{"id":90,"slug":91,"title":92,"created_at":93},"ee073da7-28b3-4752-a319-5a501459fb87","ai-in-2026-what-actually-matters-now-zh","2026 AI 真正重要的事","2026-03-26T07:09:12.008134+00:00",{"id":95,"slug":96,"title":97,"created_at":98},"83bd1795-8548-44c9-9a7e-de50a0923f71","trump-ai-framework-power-speech-state-preemption-zh","川普 AI 框架瞄準電力、言論與州權","2026-03-26T07:12:18.695466+00:00",{"id":100,"slug":101,"title":102,"created_at":103},"ea6be18b-c903-4e54-97b7-5f7447a612e0","nvidia-gtc-2026-big-ai-announcements-zh","NVIDIA GTC 2026 重點拆解","2026-03-26T07:14:26.62638+00:00",{"id":105,"slug":106,"title":107,"created_at":108},"4bcec76f-4c36-4daa-909f-54cd702f7c93","claude-users-spreading-out-and-getting-better-zh","Claude 用戶更分散，也更會用","2026-03-26T07:22:52.325888+00:00",{"id":110,"slug":111,"title":112,"created_at":113},"bd903b15-2473-4178-9789-b7557816e535","openclaw-raises-hard-question-for-ai-models-zh","OpenClaw 逼問 AI 模型價值","2026-03-26T07:24:54.707486+00:00",{"id":115,"slug":116,"title":117,"created_at":118},"eeac6b9e-ad9d-4831-8eec-8bba3f9bca6a","gap-google-gemini-checkout-fashion-search-zh","Gap 把結帳搬進 Gemini","2026-03-26T07:28:23.937768+00:00",{"id":120,"slug":121,"title":122,"created_at":123},"0740e53f-605d-4d57-8601-c10beb126f3c","google-pushes-gemini-transition-to-march-2026-zh","Google 把 Gemini 轉換延到 2026 年 3…","2026-03-26T07:30:12.825269+00:00",{"id":125,"slug":126,"title":127,"created_at":128},"e660d801-2421-4529-8fa9-86b82b066990","metas-llama-4-benchmark-scandal-gets-worse-zh","Meta Llama 4 分數風波又擴大","2026-03-26T07:34:21.156421+00:00",{"id":130,"slug":131,"title":132,"created_at":133},"183f9e7c-e143-40bb-a6d5-67ba84a3a8bc","accenture-mistral-ai-sovereign-enterprise-deal-zh","Accenture 攜手 Mistral AI 賣主權 AI","2026-03-26T07:38:14.818906+00:00",{"id":135,"slug":136,"title":137,"created_at":138},"191d9b1b-768a-478c-978c-dd7431a38149","mistral-ai-faces-its-hardest-year-yet-zh","Mistral AI 迎來最硬的一年","2026-03-26T07:40:23.716374+00:00"]