[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"article-dashattention-differentiable-adaptive-sparse-attention-zh":3,"article-related-dashattention-differentiable-adaptive-sparse-attention-zh":36,"series-research-475844e6-3e2c-49a6-aea0-86a94945d2c2":87},{"id":4,"title":5,"content":6,"summary":7,"source":8,"source_url":9,"author":10,"image_url":11,"keywords":12,"language":18,"translated_content":10,"views":19,"is_premium":20,"created_at":21,"updated_at":21,"cover_image":11,"published_at":22,"rewrite_status":23,"rewrite_error":10,"rewritten_from_id":24,"slug":25,"category":26,"related_article_id":27,"status":28,"google_indexed_at":29,"x_posted_at":10,"tweet_text":10,"title_rewritten_at":10,"title_original":10,"key_takeaways":30,"topic_cluster_id":34,"embedding":35,"is_canonical_seed":20},"475844e6-3e2c-49a6-aea0-86a94945d2c2","DashAttention 讓稀疏長上下文可微","\u003Cp data-speakable=\"summary\">DashAttention 把\u003Ca href=\"\u002Ftag\u002F長上下文\">長上下文\u003C\u002Fa>分層注意力做成可微、可自適應的稀疏選擇，讓模型在高稀疏下仍能保住效能。\u003C\u002Fp>\u003Cul>\u003Cli>\u003Cstrong>研究機構\u003C\u002Fstrong>：arXiv 摘要未明確標註\u003C\u002Fli>\u003Cli>\u003Cstrong>核心數據\u003C\u002Fstrong>：75% sparsity\u003C\u002Fli>\u003Cli>\u003Cstrong>突破點\u003C\u002Fstrong>：α-entmax 自適應選塊\u003C\u002Fli>\u003C\u002Ful>\u003Cp>長上下文注意力一直有個老問題：你可以看得很廣，但成本高；你也可以先砍掉一大半內容，但很容易把關鍵資訊一起丟掉。這篇論文要處理的，就是這個「省算力」和「保品質」之間的拉扯。\u003C\u002Fp>\u003Cp>作者認為，現有的分層式稀疏注意力，像是先粗選 KV block、再做細粒度 softmax 的流程，最大的問題不在於「不夠快」，而在於「太硬」。因為它通常靠 top-k 做離散選擇，等於預先假設每個 query 都只需要固定數量的相關區塊。實際上，不同 query 對上下文的需求差很多，這種固定門檻會限制模型表現。\u003C\u002Fp>\u003Cp>這篇論文提出的 \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.18753\">DashAttention: Differentiable and Adaptive Sparse Hierarchical Attention\u003C\u002Fa>，就是想把這個流程改成更靈活的版本。它保留分層注意力的效率優勢，但把第一階段改成可自適應、可微分的稀疏選擇，讓 sparse 和 dense 兩段可以一起訓練，而不是像兩個彼此切開的模組。\u003C\u002Fp>\u003Ch2>這篇在解什麼痛點\u003C\u002Fh2>\u003Cp>先講白話版。長上下文模型最怕的不是「沒有注意力」，而是「注意力太分散」。如果前面那層粗選做得不準，後面的精細注意力就只能在錯的候選集合裡找答案。這時候模型看起來還在運作，但其實已經偏掉了。\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1779171840613-dq1r.png\" alt=\"DashAttention 讓稀疏長上下文可微\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>傳統 top-k 分層注意力有兩個結構性限制。第一，它不會因為 query 不同就改變保留數量。第二，top-k 是離散操作，梯度沒辦法順暢穿過 sparse selection 與 dense attention 的邊界。結果就是，模型雖然有「先篩再算」的設計，卻不一定學得到真正適合自己的篩法。\u003C\u002Fp>\u003Cp>DashAttention 的出發點，就是把這個硬切的流程改成可學習的流程。它不是單純把 \u003Ca href=\"\u002Ftag\u002Ftoken\">token\u003C\u002Fa> 砍少，而是讓模型自己決定該保留多少個 KV block，並且維持整條路徑都能反向傳播。\u003C\u002Fp>\u003Ch2>方法怎麼運作\u003C\u002Fh2>\u003Cp>DashAttention 的核心是兩階段結構。第一階段不是固定 top-k，而是用 α-entmax 來做稀疏選擇。這個轉換的\u003Ca href=\"\u002Fnews\u002Fspurs-vs-timberwolves-game-5-takeaways-zh-tw-zh\">重點\u003C\u002Fa>在於，它可以產生稀疏輸出，但保留的 block 數量可以隨 query 而變，不需要每次都死守同一個 k。\u003C\u002Fp>\u003Cp>換句話說，有些 query 需要更多上下文，系統就能保留更多 block；有些 query 只需要少量資訊，就能更果斷地稀疏化。這讓第一階段不再只是粗暴過濾，而是\u003Ca href=\"\u002Fnews\u002Fwembanyama-stat-page-turns-into-recap-zh\">變成\u003C\u002Fa>一個依照內容調整的 prior。\u003C\u002Fp>\u003Cp>第二階段則是在被選出的區塊上做更細的 softmax attention。因為前面的稀疏選擇本身是可微的，所以 sparse 與 dense 不再是互相獨立的兩段式流程，而是可以一起優化的整體。這就是 DashAttention 跟傳統 top-k pipeline 最大的差別。\u003C\u002Fp>\u003Cp>論文用的不是「更少 token」這種單一目標，而是「可變數量的 block 選擇 + 可微分的分層注意力」。這也解釋了為\u003Ca href=\"\u002Fnews\u002Fwhy-wembanyama-game-3-should-change-spurs-expectations-zh\">什麼\u003C\u002Fa>作者會特別強調它是 adaptive、differentiable，而且還是 hierarchical。\u003C\u002Fp>\u003Ch2>論文實際證明了什麼\u003C\u002Fh2>\u003Cp>根據摘要，作者主張 DashAttention 具有 non-dispersive 的特性，也就是注意力不會過度發散。這點被用來解釋它在長上下文建模上的表現會更穩。\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1779171846871-4q8c.png\" alt=\"DashAttention 讓稀疏長上下文可微\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>在大語言模型實驗中，摘要寫到 DashAttention 在 75% sparsity 下，能做到和 full attention 相近的準確度。它也比 NSA 和 InfLLMv2 有更好的 Pareto frontier，尤其是在高稀疏區間。這代表它不是只在「省算力」這一端有優勢，而是能把效能與效率的平衡往更好的方向推。\u003C\u002Fp>\u003Cp>不過，這裡也要講清楚限制：摘要沒有公開完整 benchmark 細節。它沒有列出完整測試集、任務名稱、模型尺寸，也沒有把精確 accuracy 數字全部攤開。所以從摘要能確定的是趨勢，不是完整的實驗圖表。\u003C\u002Fp>\u003Cp>另外，作者還提供了 \u003Ca href=\"\u002Ftag\u002Fgpu\">GPU\u003C\u002Fa>-aware 的 Triton 實作。摘要指出，這個實作在 inference 時的速度表現，甚至能優於 FlashAttention-3。不過摘要沒有給出確切倍率，所以我們只能說它有速度優勢，不能替它補上沒寫出的數字。\u003C\u002Fp>\u003Ch2>對開發者代表什麼\u003C\u002Fh2>\u003Cp>如果你在做長上下文 \u003Ca href=\"\u002Ftag\u002Fllm\">LLM\u003C\u002Fa> 系統，真正的問題從來不是「能不能稀疏」，而是「稀疏會不會把品質砍壞」。DashAttention 的價值，在於它試圖把這兩件事一起解，而不是先犧牲一邊再補另一邊。\u003C\u002Fp>\u003Cp>這對調整 attention 層的成本曲線很有意義。75% sparsity 還能維持接近 full attention 的結果，至少在論文摘要的描述裡，已經顯示它不是那種單純靠剪枝換速度、最後品質掉一大截的方法。對需要長上下文推理、又受限於記憶體頻寬與延遲的場景，這種設計方向很有吸引力。\u003C\u002Fp>\u003Cp>更實際的一點是，作者把 GPU-aware Triton implementation 一起端出來。對開發者來說，這通常比單純的算法概念更重要。因為 attention 類方法最後能不能落地，常常不是看論文圖畫得漂不漂亮，而是看 kernel、硬體和序列長度能不能配合。\u003C\u002Fp>\u003Ch2>還有哪些限制與問題沒回答\u003C\u002Fh2>\u003Cp>摘要也留下不少工程師會想追問的空白。首先是 benchmark 資訊不足。你看不到完整數據集、模型規模、測試條件，也不知道它在不同任務上的表現是否一致。這讓它很難直接被拿來和其他方法做嚴格對照。\u003C\u002Fp>\u003Cp>其次，分層稀疏注意力的實際收益很吃系統條件。kernel 寫得好不好、GPU 架構、序列長度、部署方式，都會影響最後的速度和成本。摘要雖然說 Triton 實作很有效率，但沒有說明這些優勢在不同環境下能不能穩定重現。\u003C\u002Fp>\u003Cp>還有一個問題是泛化性。摘要只提到 large language models 的實驗結果，但沒有說跨架構、跨任務，這套 adaptive sparse selection 是否都能維持同樣的 Pareto 改善。這些都需要看完整論文或後續實作驗證。\u003C\u002Fp>\u003Cp>即便如此，這篇的方向還是很清楚：它想把 sparse attention 從固定規則，推向可學習、可變動、端到端可訓練的機制。對長上下文模型來說，這不是小修小補，而是把稀疏化從「硬切」改成「會判斷的選擇」。\u003C\u002Fp>\u003Ch2>總結\u003C\u002Fh2>\u003Cp>DashAttention 證明了一件事：長上下文注意力不一定要在「全看」和「硬砍」之間二選一。它可以在保留分層效率的同時，讓稀疏選擇變成可微、可自適應的流程。\u003C\u002Fp>\u003Cp>從摘要能看到的結果是，這種設計在 75% sparsity 下仍能維持接近 full attention 的表現，並且在高稀疏區間比 NSA 和 InfLLMv2 更有優勢。對開發者來說，這代表稀疏注意力還有繼續往「更聰明」方向演進的空間，而不只是單純把 token 砍少。\u003C\u002Fp>\u003Cul>\u003Cli>DashAttention 把固定 top-k 改成 α-entmax 自適應選塊。\u003C\u002Fli>\u003Cli>它讓 sparse 與 dense attention 保持可微分，方便端到端訓練。\u003C\u002Fli>\u003Cli>摘要只公開了趨勢與 75% sparsity，沒有完整 benchmark 表格。\u003C\u002Fli>\u003C\u002Ful>","DashAttention 把長上下文的分層稀疏注意力改成可微、可自適應的選擇機制，讓模型在 75% 稀疏下仍能維持接近全注意力的表現。","arxiv.org","https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.18753",null,"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1779171840613-dq1r.png",[13,14,15,16,17],"DashAttention","sparse attention","α-entmax","long-context","Triton","zh",2,false,"2026-05-19T06:23:32.886786+00:00","2026-05-19T06:23:32.697+00:00","done","678404b6-2b28-4fd9-842d-aeedfe46f3da","dashattention-differentiable-adaptive-sparse-attention-zh","research","f15bbb27-837c-4841-9460-5c68d705e883","published","2026-05-19T09:00:33.166+00:00",[31,32,33],"把固定 top-k 稀疏選擇改成可變數量、可微分的 α-entmax 機制。","摘要宣稱在 75% sparsity 下仍能接近 full attention，且優於 NSA 與 InfLLMv2 的 Pareto frontier。","摘要沒有公開完整 benchmark 細節，工程上仍需看模型、資料集與 kernel 表現。","0c35a120-52fc-41fc-afa3-d404eb934158","[-6.6652115e-05,0.0137933,-0.0066679255,-0.081891045,-0.05107227,-0.0034576356,-0.02212024,0.010894429,0.004160441,0.012508332,0.0052860514,-0.032014925,0.013915917,-0.007926756,0.1072012,0.009893785,-0.00043439047,0.051399622,-0.0059501315,-0.017213004,0.007416725,0.015053953,-0.00323159,-0.000103167215,-0.022917949,0.0039662025,0.005404488,0.012974491,0.03479995,0.011295869,-0.007930071,0.0184232,0.027281797,3.2240965e-05,-0.004612436,0.01461788,0.046283077,-0.014091525,0.015017944,0.008243168,-0.0077850767,-0.008638462,0.006653567,0.00039586338,-0.009165462,0.0010016342,0.020397196,-0.016071664,0.0063091046,0.026883045,0.0099239,0.015883457,-0.014730644,-0.14789197,-0.00056343613,-0.006242751,-0.01693113,0.0073981048,0.0045774225,0.009631829,-0.0044751246,0.0153076155,0.0026006624,-0.0107752,0.00908799,-0.007587826,0.028064441,0.0062762653,-0.017734708,0.015934689,-0.00016827698,0.007652582,0.016246667,-0.015555964,0.03352318,-0.028329168,0.009814951,0.0033986839,0.008953144,0.012261134,0.019783694,-0.031369213,0.0050607263,0.0051741875,-0.008053283,0.015776217,0.014028811,0.018649276,0.039394345,0.0087748105,-0.016814996,0.010309174,0.020249352,0.0069843046,0.0063812095,-0.009497031,0.008504972,0.0014500621,0.015591948,0.004447159,-0.00737631,-0.03693912,0.0035860743,0.015802361,0.032143675,-0.012678401,0.018552182,0.014562437,0.0027974583,0.00017671348,-0.019232308,-0.011443753,-0.0058350684,0.017638123,0.004292095,-0.13302915,-0.0007797616,-0.020884888,-0.0008061785,0.0064799055,-0.005181127,0.00955591,-0.003291169,0.022560295,0.022661287,-0.005040518,0.019667111,-0.021062743,-0.002749885,-0.033120003,-0.014126842,0.010567248,0.032475125,-0.016873855,-0.010034447,0.021213885,0.02557027,-0.00055286783,-0.029049575,-0.02770277,-0.025256297,0.016627975,-0.029421693,-0.014286164,-0.013584693,-0.026225206,-0.034793917,0.022699509,0.022434162,-0.010159229,0.028312279,-0.022162784,0.015412548,-0.022561293,0.011270778,-0.02351146,-0.021951415,0.033706523,0.008642093,0.028951565,0.0041095535,0.011703695,-0.008848917,-0.0013511862,0.002773168,0.028418723,0.012822565,-0.0030974643,0.013364884,0.025692012,0.014067111,0.0042259335,0.023204423,0.023227273,0.009532759,0.00030148233,-0.019555027,0.0050125015,-0.008633887,0.002654845,-0.0014127465,0.006488481,-0.007593568,0.009668673,0.016887637,0.013396814,-0.0019282054,0.019372681,0.0010463984,0.020320399,-0.030168056,0.010146153,3.5579685e-06,-0.0118011255,0.005506075,-0.039769318,0.0016520886,0.038062565,-0.0004193084,0.0051685763,0.0023916063,-0.0061705313,0.01505611,-0.015144065,0.02225451,-0.015648138,0.01564109,-0.027692595,-0.0035864315,0.004743458,0.029765064,0.011132068,0.015574071,-0.020032741,-0.0030736772,-0.0028368272,-0.024282223,-0.011846558,0.020869348,-0.02598431,0.006900966,-0.03702749,0.026707368,-0.012561401,0.013047032,-0.026516058,-0.027802523,-0.023752363,-0.0013853102,0.025491545,0.00417264,0.023421934,-0.0032354011,-0.0097170845,-0.015644109,-0.017574314,-0.02463849,0.019596072,0.018807285,0.011022061,0.0038241863,0.019204888,-0.0048560514,-0.0037971877,0.028886478,-0.0024442556,0.009271797,0.0030358194,0.00027090116,0.021641705,-0.00014856082,-0.022001505,0.0184088,-0.014226021,0.019488141,0.008471593,-0.022887914,0.0019778972,-0.0086845625,-0.0013579051,0.0072518294,0.0015410377,0.0017233836,-0.003347774,0.011648309,0.01326632,0.017407026,-0.0059562474,-0.046499,-0.025694048,-0.019632641,0.0071899486,-0.007364989,0.016391164,0.0031466444,-0.04706315,-0.056771122,0.042205706,-0.004490475,-0.042649165,0.0016523923,-0.023046115,-0.01230782,0.015691724,-0.0044280645,-0.0241964,-0.0023793874,-0.010579234,-0.028294604,-6.8048494e-06,0.02863902,0.026257042,0.0016718,0.009574537,0.006135804,-0.018251628,-0.019380683,-0.0032179884,0.026372083,0.0005174382,0.03484341,0.01799932,-0.005180552,0.031275626,0.0021809507,0.013934037,0.010049546,0.04378625,0.015802005,0.009002661,-0.026108924,-0.004829805,-0.01201571,-0.031964313,-0.008663994,-0.02048944,0.0021695776,0.012390621,-0.017146796,-0.005436215,0.015457291,-0.008527547,-0.007617737,0.00983429,-0.013389824,-0.006932724,0.022727178,-0.0059293355,0.0134425,-0.024279086,-0.011270515,-0.0044114254,0.03437839,0.0030732078,-0.005101676,0.027708447,0.00040507084,0.0010133038,-0.011958085,-0.02879991,0.011706184,0.022667807,-0.011918444,0.02034266,-0.031494044,0.015256061,0.03127142,0.035441533,0.0057697548,-0.022043236,0.008172553,-0.009052688,0.012873442,-0.044757873,-0.024701882,0.015403062,-0.0076957657,-0.023672579,0.00073384936,0.007144232,-0.0025686685,-0.011365799,-0.0034666222,0.00093836966,0.026306178,-0.017479831,-0.010638889,0.008318276,-0.022243083,-0.008776513,-0.021362757,0.007696904,-0.0022856274,-0.0037462204,0.012477556,0.013595622,-0.0068666777,0.004072156,-0.011195181,-0.0022996627,-0.017987678,0.03903834,-0.01294566,-0.014101209,0.006457627,0.0068143057,0.028842743,-0.0032127672,-0.0049609314,-0.014365936,0.0067608063,0.026444698,-0.013976495,-0.002230399,-0.0061614388,0.013528624,-0.016887901,0.0011440584,-0.015454824,-4.816936e-05,-0.0018727311,-0.003945865,0.005193439,-0.020360809,0.009787023,-0.020216575,-0.005157791,0.014155813,-0.013010591,-0.008794556,-0.0013850784,0.0072027785,0.019890457,-0.0050927685,0.026084963,-0.002029211,-0.0156209,0.0073397798,0.0062896325,-0.0012563237,-0.003680149,-0.0049449946,0.00786489,-0.0067682085,0.015107454,0.0051760618,-0.036793023,-0.031746108,0.0058825375,-0.056366563,-0.013517692,-0.006876133,-0.022168355,-0.0069855275,-0.005132898,-0.024693277,-0.007230644,-0.0012121066,0.002284298,-0.018928114,0.008238409,-0.0111488225,-0.0072070477,0.013266696,-0.008229626,-0.021815196,-0.001695839,0.014424622,-0.039634027,0.022293573,0.036504455,0.04584366,-0.0018691547,-0.012518539,0.022779727,0.016405411,0.00047111505,0.014715523,-0.016070152,-0.008903673,-0.04534875,-0.0076928465,0.02624777,0.015169276,0.0036134853,-0.018691543,-0.03344975,0.017196465,-0.012425426,0.037458543,0.02190512,-0.010534183,0.00500492,0.0025784455,-0.005030411,0.028645946,-0.01906577,0.019292913,0.009255527,0.004390964,-0.0016758442,-0.01854358,0.0078122704,-0.000179031,0.008181647,0.034673687,-0.0068088355,0.020041434,0.004402999,0.029436812,0.008830101,0.06329984,0.016240733,-0.0129224695,-0.020302456,-0.015566935,-0.038519505,-0.027409583,0.008408931,0.0011256124,0.012355778,-0.011414446,0.004247954,-0.032192416,-0.01190515,-0.0037613981,-0.00096447364,0.0105599025,0.019759249,-0.01864495,-0.020166412,-0.009077973,0.04302931,0.019653372,-0.01167896,0.005511357,0.002785373,0.0012265409,0.006592289,-0.004128407,-0.002892987,-0.0019866652,-0.014788687,0.021383686,0.011433488,-0.014803527,0.018942181,-0.013774272,0.01435285,0.012989654,0.0007873858,0.020360006,-0.0028712568,-0.023040524,-0.0064560673,-0.008171052,-0.011340806,0.028681776,0.013370791,-0.01079232,0.0025680072,0.0018043169,-0.0036196846,0.0016425782,0.03896851,-0.08941394,0.006026117,-0.0058378484,-0.005579518,0.00206084,0.0020659284,0.017664008,-0.02317427,0.015928462,0.0022994003,-0.00567274,0.013108547,0.034511767,0.0037375204,-0.00815916,0.0049909074,-0.025327174,-0.015945492,0.011598201,-0.010265369,0.015720692,-0.0026973428,-0.00093696016,0.0038401913,0.028804472,-0.001936728,0.028314626,-9.814887e-05,0.003103561,-0.011462473,-0.01248582,-0.009651263,0.02137654,0.025362182,-0.00058975245,0.0022122506,0.029166648,-0.0013240192,0.014236139,-0.007118604,0.016055133,0.027503863,-0.020533953,-0.008774289,0.004777613,-0.0077387663,-0.00426957,0.01649704,-0.0011657608,0.0011658637,-0.014521486,-0.03217104,0.009042425,0.006956492,-0.026459856,-0.0027494354,-0.015244369,-0.005412458,0.011816695,0.005046203,-0.014908013,0.0027490598,-0.023011012,0.03149646,-0.0014301267,0.010108905,0.0023043407,0.024655083,0.0117269335,0.02254639,-0.02295596,0.005901238,0.03156329,0.0034528319,-0.023679206,0.012838418,0.002812447,0.00039484384,0.0024724826,-0.009855992,-0.0021969299,-0.018299906,-0.091084674,-0.004526532,0.0016355248,-0.009495637,-0.010678184,0.010964008,-0.027035784,0.012823588,0.009488433,-0.03373698,0.010841982,-0.0050061047,-0.022381442,-0.031934593,0.023603069,0.01338966,-0.0045122565,-0.004235107,0.021158619,-0.04804086,-0.0017917275,-0.016789567,0.016466094,-0.02474845,-0.00090718584,-0.026226597,-0.005701937,-0.02117172,0.01256045,-0.011394838,0.002818555,-0.15240195,-0.008115415,0.002930496,-0.0036802245,0.004453443,0.002064844,-0.019692514,-0.015990468,0.009221168,-0.009575614,0.007695092,-0.018763842,-0.012561283,0.005906295,0.0052943635,0.10610255,-0.0054793716,-0.00027737304,-0.024870753,-0.033991713,0.0042208135,-0.019068439,0.00048510853,0.036543775,0.008557319,0.010845801,-0.01803013,-0.011749766,-0.017801559,0.03300438,-0.01031697,0.033225086,-0.024547638,-0.0026260305,-0.008144037,0.019492785,0.03552453,-0.024077257,0.019507097,-0.0049394374,0.030386051,0.0014901871,0.021371385,-0.014283192,-0.0067064012,-0.031572726,0.00999106,-0.015204225,-0.011355643,0.0060368287,-0.0016874995,-0.0623961,-0.016527511,-0.004520043,0.020854082,-0.0008350346,-0.019859366,0.011414074,0.043990478,-0.0071241097,0.0057938523,0.00388456,0.00096312695,0.0031365843,-0.0024743506,0.005769751,-0.019200724,0.04049222,-0.010310156,0.019456916,-0.016488949,0.016015595,0.013093083,-0.0036098373,0.015847934,-0.019162018,0.019070184,0.018637221,0.0036695374,-0.017846499,-0.0010215333,-0.00872927,-0.008207142,-0.022000242,0.03142344,-0.0036795796,0.027378399,0.018679716,0.0028027655,-0.02553514,0.009113676,0.004262131,0.0011276288,-0.0027973903,-0.0035358586,0.0075679305,-0.019141212,0.0129059795,0.006478183,-0.009407451,0.00043940422,-0.038335897,-0.0026240854,-0.016592644,0.000855786,-0.0009809029,-0.009209518,0.0064000553,0.023251224,-0.014073219]",{"tags":37,"relatedLang":46,"relatedPosts":50},[38,40,41,42,44],{"name":14,"slug":39},"sparse-attention",{"name":15,"slug":15},{"name":16,"slug":16},{"name":13,"slug":43},"dashattention",{"name":17,"slug":45},"triton",{"id":27,"slug":47,"title":48,"language":49},"dashattention-differentiable-adaptive-sparse-attention-en","DashAttention makes sparse long-context attention differentiable","en",[51,57,63,69,75,81],{"id":52,"slug":53,"title":54,"cover_image":55,"image_url":55,"created_at":56,"category":26},"d1c6850c-f832-471b-8beb-c0ebc809667d","peft-bench-fine-tuning-methods-benchmark-zh","PEFT-Bench 讓微調比較更公平","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1779179048497-jm5y.png","2026-05-19T08:23:36.803043+00:00",{"id":58,"slug":59,"title":60,"cover_image":61,"image_url":61,"created_at":62,"category":26},"e24e6e7a-6181-476b-8583-339d854cec68","confident-ai-llm-evaluation-metrics-guide-zh","Confident AI 的 LLM 評估指標指南","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1779178456675-x5m6.png","2026-05-19T08:13:46.193772+00:00",{"id":64,"slug":65,"title":66,"cover_image":67,"image_url":67,"created_at":68,"category":26},"adfa9b15-68b6-44cc-b34d-ebcb02c31210","code-becomes-the-agent-harness-zh","程式碼成了代理引擎","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1779173040130-zcyg.png","2026-05-19T06:43:29.625994+00:00",{"id":70,"slug":71,"title":72,"cover_image":73,"image_url":73,"created_at":74,"category":26},"eda7a80a-b234-4ada-90d1-a37b144251dc","rrfp-readiness-driven-pipeline-training-zh","RRFP 讓管線訓練跟著就緒跑","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1779172442474-n21q.png","2026-05-19T06:33:31.287772+00:00",{"id":76,"slug":77,"title":78,"cover_image":79,"image_url":79,"created_at":80,"category":26},"23a3d4c7-5cb7-40ae-a05b-1542364e786f","ibm-prompt-guide-turns-ai-guesses-into-outputs-zh","IBM 提示指南把猜答案變輸出","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1779132863293-etob.png","2026-05-18T19:33:55.711767+00:00",{"id":82,"slug":83,"title":84,"cover_image":85,"image_url":85,"created_at":86,"category":26},"7c89c3bd-48cb-4b4e-942d-bbf0409fc392","cattle-trade-llm-bluffing-bargaining-benchmark-zh","Cattle Trade 要測 LLM 談判 bluffing","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1779085437419-b0zw.png","2026-05-18T06:23:27.885037+00:00",[88,93,98,103,108,113,118,123,128,133],{"id":89,"slug":90,"title":91,"created_at":92},"f18dbadb-8c59-4723-84a4-6ad22746c77a","deepmind-bets-on-continuous-learning-ai-2026-zh","DeepMind 押注 2026 連續學習 AI","2026-03-26T08:16:02.367355+00:00",{"id":94,"slug":95,"title":96,"created_at":97},"f4a106cb-02a6-4508-8f39-9720a0a93cee","ml-papers-of-the-week-github-research-desk-zh","每週 ML 論文清單，為何紅到 GitHub","2026-03-27T01:11:39.284175+00:00",{"id":99,"slug":100,"title":101,"created_at":102},"c4f807ca-4e5f-47f1-a48c-961cf3fc44dc","ai-ml-conferences-to-watch-in-2026-zh","2026 AI 研討會投稿時程整理","2026-03-27T01:51:53.874432+00:00",{"id":104,"slug":105,"title":106,"created_at":107},"9f50561b-aebd-46ba-94a8-363198aa7091","openclaw-agents-manipulated-self-sabotage-zh","OpenClaw Agent 會自己搞砸自己","2026-03-28T03:03:18.786425+00:00",{"id":109,"slug":110,"title":111,"created_at":112},"11f22e92-7066-4978-a544-31f5f2156ec6","vega-learning-to-drive-with-natural-language-instructions-zh","Vega：使用自然語言指示進行自駕車控制","2026-03-28T14:54:04.847912+00:00",{"id":114,"slug":115,"title":116,"created_at":117},"a4c7cfec-8d0e-4fec-93cf-1b9699a530b8","drive-my-way-en-zh","Drive My Way：個性化自駕車風格的實現","2026-03-28T14:54:26.207495+00:00",{"id":119,"slug":120,"title":121,"created_at":122},"dec02f89-fd39-41ba-8e4d-11ede93a536d","training-knowledge-bases-with-writeback-rag-zh","用 WriteBack-RAG 強化知識庫提升檢索效能","2026-03-28T14:54:45.775606+00:00",{"id":124,"slug":125,"title":126,"created_at":127},"3886be5c-a137-40cc-b9e2-0bf18430c002","packforcing-efficient-long-video-generation-method-zh","PackForcing：短影片訓練也能生成長影片","2026-03-28T14:55:02.688141+00:00",{"id":129,"slug":130,"title":131,"created_at":132},"72b90667-d930-4cc9-8ced-aaa0f8968d44","pixelsmile-toward-fine-grained-facial-expression-editing-zh","PixelSmile：提升精細臉部表情編輯的新方法","2026-03-28T14:55:20.678181+00:00",{"id":134,"slug":135,"title":136,"created_at":137},"cf046742-efb2-4753-aef9-caed5da5e32e","adaptive-block-scaled-data-types-zh","IF4：神經網路量化的聰明選擇","2026-03-31T06:00:36.990273+00:00"]