[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"article-cuda-tile-basic-nvidia-april-fools-post-zh":3,"tags-cuda-tile-basic-nvidia-april-fools-post-zh":34,"related-lang-cuda-tile-basic-nvidia-april-fools-post-zh":51,"related-posts-cuda-tile-basic-nvidia-april-fools-post-zh":55,"series-tools-a5f71507-4d4c-434f-834b-5fbe0405a5d9":92},{"id":4,"title":5,"content":6,"summary":7,"source":8,"source_url":9,"author":10,"image_url":11,"keywords":12,"language":22,"translated_content":10,"views":23,"is_premium":24,"created_at":25,"updated_at":25,"cover_image":11,"published_at":26,"rewrite_status":27,"rewrite_error":10,"rewritten_from_id":28,"slug":29,"category":30,"related_article_id":31,"status":32,"google_indexed_at":33,"x_posted_at":10,"tweet_text":10,"title_rewritten_at":10,"title_original":10,"key_takeaways":10,"topic_cluster_id":10,"embedding":10,"is_canonical_seed":24},"a5f71507-4d4c-434f-834b-5fbe0405a5d9","NVIDIA 把 CUDA Tile 搬進 BASIC","\u003Cp>NVIDIA 在 2026 年 4 月 1 日丟出一篇很會玩的文。主角是 \u003Ca href=\"https:\u002F\u002Fdeveloper.nvidia.com\u002Fblog\u002Fcuda-tile-programming-now-available-for-basic\u002F\" target=\"_blank\" rel=\"noopener\">cuTile BASIC\u003C\u002Fa>。它把 \u003Ca href=\"https:\u002F\u002Fdeveloper.nvidia.com\u002Fcuda-toolkit\" target=\"_blank\" rel=\"noopener\">CUDA 13.1\u003C\u002Fa> 的 tile-based GPU 編程，包進 BASIC 外皮。\u003C\u002Fp>\u003Cp>這梗很鬧，但不是純搞笑。文章裡的範例真的在講 tile、MMA、資料分塊。講白了，NVIDIA 想證明一件事：GPU 程式不必永遠綁死在 CUDA C++。\u003C\u002Fp>\u003Cp>如果你寫過 kernel，就知道痛點在哪。你要管 thread、block、\u003Ca href=\"\u002Fnews\u002Fanthropic-xero-ai-small-business-finance-zh\">lau\u003C\u002Fa>nch config，還要顧 memory access。cuTile BASIC 的意思很直接：把心力移到資料切塊，剩下交給編譯器。\u003C\u002Fp>\u003Ch2>這篇文章到底在秀什麼\u003C\u002Fh2>\u003Cp>核心其實是 \u003Ca href=\"https:\u002F\u002Fdeveloper.nvidia.com\u002Fblog\u002Fnvidia-cuda-13-1-powers-next-gen-gpu-programming-with-nvidia-cuda-tile-and-performance-gains\u002F\" target=\"_blank\" rel=\"noopener\">CUDA Tile\u003C\u002Fa>。它是 CUDA 13.1 裡的 tile-based programming model。開發者先描述資料怎麼切成 tile，再描述 tile 上要做什麼。\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1775142782708-1euw.png\" alt=\"NVIDIA 把 CUDA Tile 搬進 BASIC\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>這種寫法很適合 GPU。因為很多工作本來就是矩陣、向量、分塊運算。你不用每次都手動算 thread index。你只要講清楚資料區塊，後面讓工具處理。\u003C\u002Fp>\u003Cp>文章拿 BASIC 來示範，也不是亂選。BASIC 是很多人第一個學的語言。它有行號，也夠老派。NVIDIA 故意用它，來凸顯 tile IR 的語言無關性。\u003C\u002Fp>\u003Cul>\u003Cli>\u003Ca href=\"https:\u002F\u002Fdocs.nvidia.com\u002Fcuda\u002Fcuda-toolkit-release-notes\u002Findex.html\" target=\"_blank\" rel=\"noopener\">CUDA Toolkit 13.1\u003C\u002Fa> 是基礎版本。\u003C\u002Fli>\u003Cli>文章提到的 GPU 需要 compute capability 8.x 到 12.x。\u003C\u002Fli>\u003Cli>驅動需求是 \u003Ca href=\"https:\u002F\u002Fdocs.nvidia.com\u002Fcuda\u002Fcuda-toolkit-release-notes\u002Findex.html\" target=\"_blank\" rel=\"noopener\">R580\u003C\u002Fa> 以上。\u003C\u002Fli>\u003Cli>Python 3.10 也在安裝清單裡。\u003C\u002Fli>\u003Cli>套件是透過 \u003Ca href=\"https:\u002F\u002Fpip.pypa.io\u002Fen\u002Fstable\u002F\" target=\"_blank\" rel=\"noopener\">pip\u003C\u002Fa> 裝進去的。\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>BASIC 只是笑點，技術也是真的\u003C\u002Fh2>\u003Cp>文章最妙的地方，是它把懷舊梗和實作混在一起。你會看到 line number、老電腦、甚至像在寫學校作業的語氣。然後下一秒，它就拿出一段 vector add 程式。\u003C\u002Fp>\u003Cp>那段程式很短。它用 \u003Ccode>TILE\u003C\u002Fcode> 和 \u003Ccode>BID\u003C\u002Fcode> 表達資料分塊。你不用手算每個 thread 的位置。這對看慣 CUDA C++ 的人來說，衝擊很大。\u003C\u002Fp>\u003Cp>更有意思的是，這不是單純語法糖。它背後是 tile IR。也就是說，BASIC 只是前端之一。真正重要的是中間層可以接很多語言。\u003C\u002Fp>\u003Cblockquote>“CUDA Tile, introduced in CUDA 13.1, enables flexible tile-based GPU programming from any language.” — NVIDIA Technical Blog\u003C\u002Fblockquote>\u003Cp>矩陣乘法的例子更有感。文章用了 \u003Ccode>MMA\u003C\u002Fcode>，還寫出像 \u003Ccode>A(128, 32)\u003C\u002Fcode>、\u003Ccode>B(32, 128)\u003C\u002Fcode>、\u003Ccode>C(128, 128)\u003C\u002Fcode> 這種 tile 尺寸。這些數字不是裝飾。這就是 GPU 最常見的思考方式。\u003C\u002Fp>\u003Cp>說真的，這種示範很聰明。因為它讓人一眼看懂。你不用先讀 200 行 kernel，才知道資料怎麼跑。對教學、原型、舊系統改造都很有用。\u003C\u002Fp>\u003Ch2>跟傳統 CUDA 比，差在哪\u003C\u002Fh2>\u003Cp>傳統 CUDA C++ 很強。這點沒人會嘴。你可以精準控制 thread mapping、shared memory、warp 行為。代價就是語法很吵，心智負擔也高。\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1775142776921-rdml.png\" alt=\"NVIDIA 把 CUDA Tile 搬進 BASIC\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>cuTile BASIC 的路線完全不同。它把重點放在 tile。你先定義資料區塊，再定義區塊上的運算。編譯器去處理很多底層細節。\u003C\u002Fp>\u003Cp>這種抽象有好處，也有代價。好處是可讀性高。代價是你少了一些手動調校空間。要榨乾最後幾個百分點效能，還是得回到更底層的工具。\u003C\u002Fp>\u003Cul>\u003Cli>CUDA C++：你要自己管 \u003Ccode>threadIdx.x\u003C\u002Fcode> 和 \u003Ccode>blockIdx.x\u003C\u002Fcode>。\u003C\u002Fli>\u003Cli>cuTile BASIC：你直接對 tile 做運算。\u003C\u002Fli>\u003Cli>CUDA C++：launch geometry 要自己算。\u003C\u002Fli>\u003Cli>cuTile BASIC：很多配置交給 compiler 和 runtime。\u003C\u002Fli>\u003Cli>CUDA C++：適合極致調校。\u003C\u002Fli>\u003Cli>cuTile BASIC：適合教學、移植、快速驗證。\u003C\u002Fli>\u003C\u002Ful>\u003Cp>文章裡的測試數字也不是空話。vector add 跑 1,024 個元素。GEMM 則是 512x512 矩陣。結果還會檢查誤差，像 max differ\u003Ca href=\"\u002Fnews\u002Fopenai-content-filtering-labeling-factory-zh\">en\u003C\u002Fa>ce 0.000012 這種值，代表它不是只做表面功夫。\u003C\u002Fp>\u003Cp>我覺得這裡最重要的訊號，是 NVIDIA 在推一個共享後端。前端可以很多種。BASIC、P\u003Ca href=\"\u002Fnews\u002Fbytedance-deerflow-2-0-47k-stars-zh\">yt\u003C\u002Fa>hon、Julia、甚至別的 DSL，都有機會接上去。這比單一語言工具鏈更有彈性。\u003C\u002Fp>\u003Ch2>這跟其他方案比，位置在哪\u003C\u002Fh2>\u003Cp>如果拿來跟一般 GPU 生態比，cuTile BASIC 很像一種介於教學與正式工具之間的東西。它不像 \u003Ca href=\"https:\u002F\u002Fdocs.nvidia.com\u002Fcuda\u002Fcuda-c-programming-guide\u002F\" target=\"_blank\" rel=\"noopener\">CUDA C Programming Guide\u003C\u002Fa> 那麼底層，也不像高階框架那麼黑盒。\u003C\u002Fp>\u003Cp>對比 \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fnvidia\u002Fcuda-tile\" target=\"_blank\" rel=\"noopener\">NVIDIA 的 GitHub 範例\u003C\u002Fa>，你可以看出方向很清楚。NVIDIA 想把 tile 當成共通語言。前端可以換，資料切塊的邏輯不換。\u003C\u002Fp>\u003Cp>這也讓人想到其他語言的 GPU 路線。像 \u003Ca href=\"https:\u002F\u002Fwww.julialang.org\u002F\" target=\"_blank\" rel=\"noopener\">Julia\u003C\u002Fa> 社群就很愛這種高階表達方式。OraCore.dev 之前也寫過 \u003Ca href=\"\u002Fnews\u002Fcutile-jl-brings-nvidia-cuda-tile-based-programming-to-julia\">cuTile.jl\u003C\u002Fa>。那篇和這次 BASIC 的邏輯很像。\u003C\u002Fp>\u003Cul>\u003Cli>傳統 CUDA：控制力最強。\u003C\u002Fli>\u003Cli>cuTile BASIC：語法最短，讀起來最直白。\u003C\u002Fli>\u003Cli>Julia 方案：適合研究和數值運算。\u003C\u002Fli>\u003Cli>Python 方案：適合資料科學團隊。\u003C\u002Fli>\u003Cli>DSL 路線：適合特定領域工作負載。\u003C\u002Fli>\u003C\u002Ful>\u003Cp>如果你問我，這種設計最有價值的地方，不是 BASIC 本身。是它證明 tile backend 可以吃下奇怪前端。這代表未來很多舊語言，也可能找到 GPU 出口。\u003C\u002Fp>\u003Cp>這對企業很實際。很多公司還留著老系統。不是每個團隊都能把 Fortran、BASIC 或自家 DSL 全部重寫。能接上 GPU，才是重點。\u003C\u002Fp>\u003Ch2>這件事放回產業脈絡看\u003C\u002Fh2>\u003Cp>GPU 編程這幾年一直在分層。底層是 CUDA、PTX、driver。上層則是各種框架、DSL、編譯器。大家都想少碰硬體細節。\u003C\u002Fp>\u003Cp>這不是偷懶。是成本問題。開發者時間很貴。能少寫 300 行樣板碼，就少掉很多 bug。尤其是矩陣運算、推論、資料搬移這類工作。\u003C\u002Fp>\u003Cp>所以 tile-based 編程很合理。它把運算單位從 thread 拉回資料。這跟現代 AI 和 HPC 的工作型態很合。很多模型本來就是大塊矩陣在跑。\u003C\u002Fp>\u003Cp>我覺得 NVIDIA 這篇 April Fools 文，其實是在測風向。它一邊玩笑，一邊告訴大家：tile IR 不是玩具。它要變成平台。\u003C\u002Fp>\u003Cp>這種做法也有市場意義。當工具鏈夠彈性，生態就比較容易長。開發者不一定愛 BASIC，但會在意「我能不能用熟悉的語言碰 GPU」。這才是重點。\u003C\u002Fp>\u003Ch2>我怎麼看這個梗\u003C\u002Fh2>\u003Cp>老實說，這篇很會寫。它的笑點夠老派，技術點也夠硬。不是那種只會丟梗圖的行銷文。它真的有 demo，也真的有數字。\u003C\u002Fp>\u003Cp>如果你是 GPU 開發者，這篇值得看。不是因為 BASIC 很酷。是因為它提醒你，編譯器和 IR 可能比語言本身更重要。\u003C\u002Fp>\u003Cp>接下來我會盯兩件事。第一，還會有哪些語言接上 tile backend。第二，這套模型在真實工作負載上，能不能少掉更多 boilerplate，又不犧牲太多效能。\u003C\u002Fp>\u003Cp>講白了，這次的重點不是 BASIC。是 NVIDIA 在告訴大家：GPU 程式可以更像在描述資料，而不是在手刻座標。你如果還在維護老程式，現在就該想想，哪個模組最適合先試 tile 化。\u003C\u002Fp>","NVIDIA 的 4 月 1 日文章把 CUDA Tile 接到 BASIC，拿 70 年代語言示範現代 GPU tile 編程。笑點很多，但背後的編譯器設計很認真。","developer.nvidia.com","https:\u002F\u002Fdeveloper.nvidia.com\u002Fblog\u002Fcuda-tile-programming-now-available-for-basic\u002F",null,"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1775142782708-1euw.png",[13,14,15,16,17,18,19,20,21],"NVIDIA","CUDA Tile","cuTile BASIC","GPU 編程","BASIC","tile-based programming","CUDA 13.1","矩陣乘法","GPU kernel","zh",1,false,"2026-04-02T15:12:38.50232+00:00","2026-04-02T15:12:38.286+00:00","done","c6f41731-609f-44f5-985f-ea7270a9e624","cuda-tile-basic-nvidia-april-fools-post-zh","tools","5eeb9239-a844-49ff-9727-b76676dc8447","published","2026-04-08T09:00:51.494+00:00",[35,37,39,42,44,46,47,49],{"name":18,"slug":36},"tile-based-programming",{"name":21,"slug":38},"gpu-kernel",{"name":40,"slug":41},"Nvidia","nvidia",{"name":16,"slug":43},"gpu-編程",{"name":19,"slug":45},"cuda-131",{"name":20,"slug":20},{"name":14,"slug":48},"cuda-tile",{"name":17,"slug":50},"basic",{"id":31,"slug":52,"title":53,"language":54},"cuda-tile-basic-nvidia-april-fools-post-en","CUDA Tile Comes to BASIC in NVIDIA’s April Fools Post","en",[56,62,68,74,80,86],{"id":57,"slug":58,"title":59,"cover_image":60,"image_url":60,"created_at":61,"category":30},"d058a76f-6548-4135-8970-f3a97f255446","why-gemini-api-pricing-is-cheaper-than-it-looks-zh","為什麼 Gemini API 定價其實比看起來更便宜","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778869845081-j4m7.png","2026-05-15T18:30:25.797639+00:00",{"id":63,"slug":64,"title":65,"cover_image":66,"image_url":66,"created_at":67,"category":30},"68e4be16-dc38-4524-a6ea-5ebe22a6c4fb","why-vidhub-huiyuan-hutong-bushi-quan-shebei-tongyong-zh","為什麼 VidHub 會員互通不是「買一次全設備通用」","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778789450987-advz.png","2026-05-14T20:10:24.048988+00:00",{"id":69,"slug":70,"title":71,"cover_image":72,"image_url":72,"created_at":73,"category":30},"7a1e174f-746b-4e82-a0e3-b2475ab39747","why-buns-zig-to-rust-experiment-is-right-zh","為什麼 Bun 的 Zig-to-Rust 實驗是對的","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778767879127-5dna.png","2026-05-14T14:10:26.886397+00:00",{"id":75,"slug":76,"title":77,"cover_image":78,"image_url":78,"created_at":79,"category":30},"e742fc73-5a65-4db3-ad17-88c99262ceb7","why-openai-api-pricing-is-product-strategy-zh","為什麼 OpenAI API 定價是產品策略，不是註腳","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778749859485-chvz.png","2026-05-14T09:10:26.003818+00:00",{"id":81,"slug":82,"title":83,"cover_image":84,"image_url":84,"created_at":85,"category":30},"c757c5d8-eda9-45dc-9020-4b002f4d6237","why-claude-code-prompt-design-beats-ide-copilots-zh","為什麼 Claude Code 的提示設計贏過 IDE Copilot","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778742645084-dao9.png","2026-05-14T07:10:29.371901+00:00",{"id":87,"slug":88,"title":89,"cover_image":90,"image_url":90,"created_at":91,"category":30},"4adef3ab-9f07-4970-91cf-77b8b581b348","why-databricks-model-serving-is-right-default-zh","為什麼 Databricks Model Serving 是生產推論的正確預設","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778692245329-a2wt.png","2026-05-13T17:10:30.659153+00:00",[93,98,103,108,113,118,123,128,133,138],{"id":94,"slug":95,"title":96,"created_at":97},"de769291-4574-4c46-a76d-772bd99e6ec9","googles-biggest-gemini-launches-in-2026-zh","Google 2026 最大 Gemini 盤點","2026-03-26T07:26:39.21072+00:00",{"id":99,"slug":100,"title":101,"created_at":102},"855cd52f-6fab-46cc-a7c1-42195e8a0de4","surepath-real-time-mcp-policy-controls-zh","SurePath 推出即時 MCP 政策控管","2026-03-26T07:57:40.77233+00:00",{"id":104,"slug":105,"title":106,"created_at":107},"9b19ab54-edef-4dbd-9ce4-a51e4bae4ebb","mcp-in-2026-the-ai-tool-layer-teams-use-zh","2026 年 MCP：團隊真的在用的 AI 工具層","2026-03-26T08:01:46.589694+00:00",{"id":109,"slug":110,"title":111,"created_at":112},"af9c46c3-7a28-410b-9f04-32b3de30a68c","prompting-in-2026-what-actually-works-zh","2026 提示工程，真正有用的是什麼","2026-03-26T08:08:12.453028+00:00",{"id":114,"slug":115,"title":116,"created_at":117},"05553086-6ed0-4758-81fd-6cab24b575e0","garry-tan-open-sources-claude-code-toolkit-zh","Garry Tan 開源 Claude Code 工具包","2026-03-26T08:26:20.068737+00:00",{"id":119,"slug":120,"title":121,"created_at":122},"042a73a2-18a2-433d-9e8f-9802b9559aac","github-ai-projects-to-watch-in-2026-zh","2026 必看 20 個 GitHub AI 專案","2026-03-26T08:28:09.619964+00:00",{"id":124,"slug":125,"title":126,"created_at":127},"a5f94120-ac0d-4483-9a8b-63590071ac6a","claude-code-vs-cursor-2026-zh","Claude Code 與 Cursor 深度對比：202…","2026-03-26T13:27:14.279193+00:00",{"id":129,"slug":130,"title":131,"created_at":132},"0975afa1-e0c7-4130-a20d-d890eaed995e","practical-github-guide-learning-ml-2026-zh","2026 機器學習入門 GitHub 實用指南","2026-03-27T01:16:49.712576+00:00",{"id":134,"slug":135,"title":136,"created_at":137},"bfdb467a-290f-4a80-b3a9-6f081afb6dff","aiml-2026-student-ai-ml-lab-repo-review-zh","AIML-2026：像課綱的學生實驗 Repo","2026-03-27T01:21:51.467798+00:00",{"id":139,"slug":140,"title":141,"created_at":142},"80cabc3e-09fc-4ff5-8f07-b8d68f5ae545","ai-trending-github-repos-and-research-feeds-zh","AI Trending：把 AI 資源收成一張表","2026-03-27T01:31:35.262183+00:00"]