[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"article-pirate-ai-q-learning-treasure-agent-zh":3,"tags-pirate-ai-q-learning-treasure-agent-zh":35,"related-lang-pirate-ai-q-learning-treasure-agent-zh":45,"related-posts-pirate-ai-q-learning-treasure-agent-zh":49,"series-industry-000c31c0-8cff-487d-a7ab-30ed1090178f":86},{"id":4,"title":5,"content":6,"summary":7,"source":8,"source_url":9,"author":10,"image_url":11,"keywords":12,"language":19,"translated_content":10,"views":20,"is_premium":21,"created_at":22,"updated_at":22,"cover_image":11,"published_at":23,"rewrite_status":24,"rewrite_error":10,"rewritten_from_id":25,"slug":26,"category":27,"related_article_id":28,"status":29,"google_indexed_at":30,"x_posted_at":10,"tweet_text":10,"title_rewritten_at":10,"title_original":10,"key_takeaways":31,"topic_cluster_id":10,"embedding":10,"is_canonical_seed":21},"000c31c0-8cff-487d-a7ab-30ed1090178f","Pirate-AI：用 Q-learning 找寶藏","\u003Cp data-speakable=\"summary\">Pirate-AI 是一個 Jupyter Notebook 強化學習專案，用 deep Q-lea\u003Ca href=\"\u002Fnews\u002Fwhy-nvidia-corning-deal-matters-ai-infrastructure-zh\">rnin\u003C\u002Fa>g 訓練海盜代理去找寶藏。\u003C\u002Fp>\u003Cp>說真的，這專案很小。\u003Ca href=\"\u002Ftag\u002Fgithub\">GitHub\u003C\u002Fa> 上只有 1 顆星，0 個 fork。可就是這種小專案，最適合拿來拆強化學習的骨架。\u003C\u002Fp>\u003Cp>它不靠手寫路線。它靠 reward、state、episode。講白了，就是讓代理自己學會哪個動作比較划算。\u003C\u002Fp>\u003Ctable>\u003Cthead>\u003Ctr>\u003Cth>指標\u003C\u002Fth>\u003Cth>數值\u003C\u002Fth>\u003C\u002Ftr>\u003C\u002Fthead>\u003Ctbody>\u003Ctr>\u003Ctd>專案\u003C\u002Ftd>\u003Ctd>\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fquestmcclure\u002FPirate-AI\" target=\"_blank\" rel=\"noopener\">questmcclure\u002FPirate-AI\u003C\u002Fa>\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>Stars\u003C\u002Ftd>\u003Ctd>1\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>Forks\u003C\u002Ftd>\u003Ctd>0\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>實作形式\u003C\u002Ftd>\u003Ctd>Jupyter Notebook\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>方法\u003C\u002Ftd>\u003Ctd>Deep Q-learning\u003C\u002Ftd>\u003C\u002Ftr>\u003C\u002Ftbody>\u003C\u002Ftable>\u003Ch2>這個專案在做什麼\u003C\u002Fh2>\u003Cp>這個 repo 的核心很直白。它要訓練一個海盜代理，去找寶藏。不是走固定腳本，也不是靠人工規則硬推。\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778418638464-itvz.png\" alt=\"Pirate-AI：用 Q-learning 找寶藏\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>它把問題變成強化學習。代理每做一次動作，就會收到回饋。回饋好，之後就多做。回饋差，就少做。\u003C\u002Fp>\u003Cp>這類設計很適合拿來教人。因為你可以很清楚看到，\u003Ca href=\"\u002Ftag\u002Fllm\">LLM\u003C\u002Fa> 之外的 AI 也是一堆數學和迭代，不是魔法。\u003C\u002Fp>\u003Cul>\u003Cli>目標很單純：找到寶藏。\u003C\u002Fli>\u003Cli>方法很典型：deep Q-learning。\u003C\u002Fli>\u003Cli>形式很輕量：Jupyter Notebook。\u003C\u002Fli>\u003Cli>重點很實用：學 policy，不是背答案。\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>Q-learning 為什麼適合拿來教\u003C\u002Fh2>\u003Cp>Q-learning 的概念不難。你可以把它想成一張動作分數表。每個 state 下，往上、往下、往左、往右，都有一個估值。\u003C\u002Fp>\u003Cp>代理每走一步，表格或神經網路就更新一次。它不是一次就學會。它要跑很多 episode，慢慢把高分動作留下來。\u003C\u002Fp>\u003Cp>Deep Q-learning 再往前一步。它不用純表格，改用神經網路近似 Q 值。這樣 state 變大時，比較撐得住。\u003C\u002Fp>\u003Cblockquote>\u003Cp>“\u003Ca href=\"\u002Ftag\u002Freinforcement-learning\">Reinforcement learning\u003C\u002Fa> is learning what to do—how to map situations to actions—so as to maximize a numerical reward signal.”\u003C\u002Fp>\u003Cfooter>— Richard S. Sutton, \u003Ca href=\"http:\u002F\u002Fincompleteideas.net\u002Fbook\u002Fthe-book-2nd.html\" target=\"_blank\" rel=\"noopener\">Reinforcement Learning: An Introduction\u003C\u002Fa>\u003C\u002Ffooter>\u003C\u002Fblockquote>\u003Cp>這句話很經典，也很貼這個專案。RL 的重點不是分類，不是生成，而是選動作。你給它一個環境，它學的是長期報酬。\u003C\u002Fp>\u003Cp>如果你做過遊戲 AI，這會很有感。因為很多問題本來就不是「答案在哪」，而是「下一步怎麼走才不會死」。\u003C\u002Fp>\u003Ch2>跟其他 RL 教學專案比\u003C\u002Fh2>\u003Cp>Pirate-AI 的優點是小。小到你可以直接在 notebook 裡看流程。這對初學者很友善，因為每一段程式都看得到。\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778418639120-oklw.png\" alt=\"Pirate-AI：用 Q-learning 找寶藏\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>但它也有侷限。它不像 \u003Ca href=\"https:\u002F\u002Fwww.gymnasium.farama.org\u002F\" target=\"_blank\" rel=\"noopener\">Gymnasium\u003C\u002Fa> 那種標準環境套件，也不像 \u003Ca href=\"https:\u002F\u002Fstable-baselines3.readthedocs.io\u002F\" target=\"_blank\" rel=\"noopener\">Stable-Baselines3\u003C\u002Fa> 那樣有完整訓練框架。你要自己多理解一點底層邏輯。\u003C\u002Fp>\u003Cp>我覺得這反而是好事。因為很多人一開始就碰大框架，結果只會調參，不會理解 Q 值怎麼來。\u003C\u002Fp>\u003Cul>\u003Cli>\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fquestmcclure\u002FPirate-AI\" target=\"_blank\" rel=\"noopener\">Pirate-AI\u003C\u002Fa>：適合入門。\u003C\u002Fli>\u003Cli>\u003Ca href=\"https:\u002F\u002Fwww.gymnasium.farama.org\u002F\" target=\"_blank\" rel=\"noopener\">Gymnasium\u003C\u002Fa>：適合做標準環境。\u003C\u002Fli>\u003Cli>\u003Ca href=\"https:\u002F\u002Fstable-baselines3.readthedocs.io\u002F\" target=\"_blank\" rel=\"noopener\">Stable-Baselines3\u003C\u002Fa>：適合快速實驗。\u003C\u002Fli>\u003Cli>\u003Ca href=\"https:\u002F\u002Fkeras.io\u002F\" target=\"_blank\" rel=\"noopener\">Keras\u003C\u002Fa>：適合寫神經網路。\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>這類專案的資料脈絡\u003C\u002Fh2>\u003Cp>強化學習在業界沒有像 LLM 那麼吵。可是它一直都在。機器人控制、遊戲 AI、排程最佳化，都會碰到這套東西。\u003C\u002Fp>\u003Cp>問題是，RL 很吃環境設計。reward 稍微寫歪，代理就會學歪。這也是為\u003Ca href=\"\u002Fnews\u002Fwall-streets-tokenization-boom-needs-crypto-rules-zh\">什麼\u003C\u002Fa>很多 demo 看起來很猛，上線卻很容易翻車。\u003C\u002Fp>\u003Cp>像 Pirate-AI 這種專案，價值不在規模。價值在它把整條訓練鏈攤開。你可以看見 state、action、reward、update，全部串在一起。\u003C\u002Fp>\u003Cp>如果你是\u003Ca href=\"\u002Ftag\u002F台灣開發者\">台灣開發者\u003C\u002Fa>，這種專案很值得看。因為它能補上很多人只會用 \u003Ca href=\"\u002Ftag\u002Fapi\">API\u003C\u002Fa>，不懂演算法的缺口。懂底層，才知道\u003Ca href=\"\u002Fnews\u002Fwhy-asset-tokenization-is-still-a-legal-problem-zh\">什麼\u003C\u002Fa>時候該用，什麼時候別亂用。\u003C\u002Fp>\u003Ch2>我會怎麼看這個 repo\u003C\u002Fh2>\u003Cp>老實說，這 repo 不像產品。它比較像教具。可是教具做得好，反而比一堆花俏 demo 更有用。\u003C\u002Fp>\u003Cp>如果你要學 RL，我會先看它怎麼定義環境，再看它怎麼更新 Q 值，最後才看模型結構。順序錯了，很容易只記住名詞。\u003C\u002Fp>\u003Cp>我也會拿它跟 DQN、SARSA、policy gradient 做比較。因為你一旦懂了 Q-learning，就比較能看懂後面那些方法在解什麼問題。\u003C\u002Fp>\u003Cul>\u003Cli>先看環境怎麼設計。\u003C\u002Fli>\u003Cli>再看 reward 怎麼給。\u003C\u002Fli>\u003Cli>接著看 Q 值怎麼更新。\u003C\u002Fli>\u003Cli>最後再看模型架構。\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>結尾：這個專案值不值得看\u003C\u002Fh2>\u003Cp>如果你想找一個很大的 AI 專案，這個不適合。它太小了，星數也不高。\u003C\u002Fp>\u003Cp>但如果你想真的理解 deep Q-learning，這種 notebook 專案很值得。因為它不會把你丟進一堆抽象名詞裡。\u003C\u002Fp>\u003Cp>我會建議你直接把它跑起來，再改 reward 或 state。你會很快發現，RL 最煩的地方，不是模型本身，而是環境和回饋怎麼設。這才是重點。\u003C\u002Fp>","Pirate-AI 是一個 Jupyter Notebook 強化學習專案，用 deep Q-learning 訓練海盜代理去找寶藏，適合看懂 RL 的基本流程。","github.com","https:\u002F\u002Fgithub.com\u002Fquestmcclure\u002FPirate-AI",null,"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778418638464-itvz.png",[13,14,15,16,17,18],"Pirate-AI","deep Q-learning","強化學習","Q-learning","Jupyter Notebook","海盜代理","zh",0,false,"2026-05-10T13:10:18.476276+00:00","2026-05-10T13:10:18.463+00:00","done","df29bef6-1a59-4b07-b57d-ab839d9532aa","pirate-ai-q-learning-treasure-agent-zh","industry","0c87c77c-199e-4990-9308-69e6582e251e","published","2026-05-11T09:00:15.604+00:00",[32,33,34],"Pirate-AI 用 deep Q-learning 示範海盜找寶藏。","這個 repo 小，但很適合學強化學習流程。","Q-learning 的重點是學動作分數，不是背固定路線。",[36,38,40,42,43],{"name":13,"slug":37},"pirate-ai",{"name":17,"slug":39},"jupyter-notebook",{"name":16,"slug":41},"q-learning",{"name":15,"slug":15},{"name":14,"slug":44},"deep-q-learning",{"id":28,"slug":46,"title":47,"language":48},"pirate-ai-q-learning-treasure-agent-en","Pirate-AI trains a treasure-seeking Q-learning agent","en",[50,56,62,68,74,80],{"id":51,"slug":52,"title":53,"cover_image":54,"image_url":54,"created_at":55,"category":27},"e6379f8a-3305-4862-bd15-1192d3247841","why-nebius-ai-pivot-is-more-real-than-hype-zh","為什麼 Nebius 的 AI 轉型比炒作更真實","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778823044520-9mfz.png","2026-05-15T05:30:24.978992+00:00",{"id":57,"slug":58,"title":59,"cover_image":60,"image_url":60,"created_at":61,"category":27},"66c4e357-d84d-43ef-a2e7-120c4609e98e","nvidia-backs-corning-factories-with-billions-zh","Nvidia 出資 Corning 工廠擴產","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778822450270-trdb.png","2026-05-15T05:20:27.701475+00:00",{"id":63,"slug":64,"title":65,"cover_image":66,"image_url":66,"created_at":67,"category":27},"31d8109c-8b0b-46e2-86bc-d274a03269d1","why-anthropic-gates-foundation-ai-public-goods-zh","為什麼 Anthropic 和 Gates Foundation 應該投資 A…","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778796636474-u508.png","2026-05-14T22:10:21.138177+00:00",{"id":69,"slug":70,"title":71,"cover_image":72,"image_url":72,"created_at":73,"category":27},"17cafb6e-9f2c-43c4-9ba3-ef211d2780b1","why-observability-is-critical-cloud-native-systems-zh","為什麼可觀測性是雲原生系統的生存條件","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778794245143-tfqn.png","2026-05-14T21:30:25.97324+00:00",{"id":75,"slug":76,"title":77,"cover_image":78,"image_url":78,"created_at":79,"category":27},"2fb441af-d3c6-4af8-a356-a40b25a67c00","data-centers-pushing-homeowners-to-solar-zh","資料中心推升房主裝太陽能","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778793651300-gi06.png","2026-05-14T21:20:40.899115+00:00",{"id":81,"slug":82,"title":83,"cover_image":84,"image_url":84,"created_at":85,"category":27},"387bddd8-e5fc-4aa9-8d1b-43a34b0ece43","how-to-choose-gpu-for-yihuan-zh","怎麼選《异环》GPU","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778786461303-39mx.png","2026-05-14T19:20:29.220124+00:00",[87,92,97,102,107,112,117,122,127,132],{"id":88,"slug":89,"title":90,"created_at":91},"ee073da7-28b3-4752-a319-5a501459fb87","ai-in-2026-what-actually-matters-now-zh","2026 AI 真正重要的事","2026-03-26T07:09:12.008134+00:00",{"id":93,"slug":94,"title":95,"created_at":96},"83bd1795-8548-44c9-9a7e-de50a0923f71","trump-ai-framework-power-speech-state-preemption-zh","川普 AI 框架瞄準電力、言論與州權","2026-03-26T07:12:18.695466+00:00",{"id":98,"slug":99,"title":100,"created_at":101},"ea6be18b-c903-4e54-97b7-5f7447a612e0","nvidia-gtc-2026-big-ai-announcements-zh","NVIDIA GTC 2026 重點拆解","2026-03-26T07:14:26.62638+00:00",{"id":103,"slug":104,"title":105,"created_at":106},"4bcec76f-4c36-4daa-909f-54cd702f7c93","claude-users-spreading-out-and-getting-better-zh","Claude 用戶更分散，也更會用","2026-03-26T07:22:52.325888+00:00",{"id":108,"slug":109,"title":110,"created_at":111},"bd903b15-2473-4178-9789-b7557816e535","openclaw-raises-hard-question-for-ai-models-zh","OpenClaw 逼問 AI 模型價值","2026-03-26T07:24:54.707486+00:00",{"id":113,"slug":114,"title":115,"created_at":116},"eeac6b9e-ad9d-4831-8eec-8bba3f9bca6a","gap-google-gemini-checkout-fashion-search-zh","Gap 把結帳搬進 Gemini","2026-03-26T07:28:23.937768+00:00",{"id":118,"slug":119,"title":120,"created_at":121},"0740e53f-605d-4d57-8601-c10beb126f3c","google-pushes-gemini-transition-to-march-2026-zh","Google 把 Gemini 轉換延到 2026 年 3…","2026-03-26T07:30:12.825269+00:00",{"id":123,"slug":124,"title":125,"created_at":126},"e660d801-2421-4529-8fa9-86b82b066990","metas-llama-4-benchmark-scandal-gets-worse-zh","Meta Llama 4 分數風波又擴大","2026-03-26T07:34:21.156421+00:00",{"id":128,"slug":129,"title":130,"created_at":131},"183f9e7c-e143-40bb-a6d5-67ba84a3a8bc","accenture-mistral-ai-sovereign-enterprise-deal-zh","Accenture 攜手 Mistral AI 賣主權 AI","2026-03-26T07:38:14.818906+00:00",{"id":133,"slug":134,"title":135,"created_at":136},"191d9b1b-768a-478c-978c-dd7431a38149","mistral-ai-faces-its-hardest-year-yet-zh","Mistral AI 迎來最硬的一年","2026-03-26T07:40:23.716374+00:00"]