[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"article-four-music-datasets-ai-music-training-en":3,"article-related-four-music-datasets-ai-music-training-en":33,"series-industry-55699da8-8f47-4348-81c3-65cd969debd3":76},{"id":4,"slug":5,"title":6,"content":7,"summary":8,"source":9,"source_url":10,"author":11,"image_url":12,"cover_image":12,"category":13,"language":14,"translated_content":11,"related_article_id":15,"keywords":16,"key_takeaways":25,"views":29,"created_at":30,"published_at":31,"topic_cluster_id":32},"55699da8-8f47-4348-81c3-65cd969debd3","four-music-datasets-ai-music-training-en","Four music datasets are shaping AI music training","\u003Cp data-speakable=\"summary\">Four large music datasets are shaping how AI music models get trained.\u003C\u002Fp>\u003Cp>Four datasets with more than 21 million recordings are circulating among AI developers, and the split between research use and commercial use is now central to the fight over music AI.\u003C\u002Fp>\u003Ctable>\u003Cthead>\u003Ctr>\u003Cth>Item\u003C\u002Fth>\u003Cth>Tracks\u003C\u002Fth>\u003Cth>Public origin\u003C\u002Fth>\u003Cth>Notable note\u003C\u002Fth>\u003C\u002Ftr>\u003C\u002Fthead>\u003Ctbody>\u003Ctr>\u003Ctd>LAION-DISCO-12M\u003C\u002Ftd>\u003Ctd>12 million+\u003C\u002Ftd>\u003Ctd>Yes\u003C\u002Ftd>\u003Ctd>Links to public YouTube tracks and metadata only\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>Large unnamed dataset\u003C\u002Ftd>\u003Ctd>9 million\u003C\u002Ftd>\u003Ctd>No public origin cited\u003C\u002Ftd>\u003Ctd>One of the two biggest collections\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>Free Music Archive\u003C\u002Ftd>\u003Ctd>100,000+\u003C\u002Ftd>\u003Ctd>Yes\u003C\u002Ftd>\u003Ctd>Used by Google and Stability AI, per The Atlantic\u003C\u002Ftd>\u003C\u002Ftr>\u003Ctr>\u003Ctd>Unnamed small dataset\u003C\u002Ftd>\u003Ctd>100,000+\u003C\u002Ftd>\u003Ctd>No public origin cited\u003C\u002Ftd>\u003Ctd>One of the two smaller collections\u003C\u002Ftd>\u003C\u002Ftr>\u003C\u002Ftbody>\u003C\u002Ftable>\u003Ch2>1. LAION-DISCO-12M\u003C\u002Fh2>\u003Cp>The biggest publicly documented collection in the report is \u003Ca href=\"https:\u002F\u002Flaion.ai\u002F\">LAION\u003C\u002Fa>'s LAION-DISCO-12M, a dataset of more than 12 million tracks released in November 2024. It was built by the German nonprofit for research, not for shipping a commercial music product.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781930863975-psfj.png\" alt=\"Four music datasets are shaping AI music training\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>LAION says the dataset is for academic settings and warns against using it commercially or in its original form for finished products. It does not distribute audio files; it provides links to publicly available YouTube tracks plus metadata.\u003C\u002Fp>\u003Cul>\u003Cli>12 million-plus tracks\u003C\u002Fli>\u003Cli>Released in November 2024\u003C\u002Fli>\u003Cli>Research-only framing\u003C\u002Fli>\u003Cli>Metadata and links, not audio files\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>2. The 9 million-track collection\u003C\u002Fh2>\u003Cp>One of the two biggest datasets in the report holds roughly 9 million tracks, but The Atlantic did not identify a public origin for it in the article summary. That opacity is part of the problem for labels and artists trying to trace where training data comes from.\u003C\u002Fp>\u003Cp>Its size matters because this is the scale where a dataset can influence model behavior across genres, eras, and artist catalogs. The report says the four datasets together include music by Bad Bunny, Nirvana, Taylor Swift, Billie Eilish, Pearl Jam, and the Beatles.\u003C\u002Fp>\u003Cul>\u003Cli>About 9 million tracks\u003C\u002Fli>\u003Cli>Publicly cited by The Atlantic, not fully sourced in the article summary\u003C\u002Fli>\u003Cli>Part of the group downloaded several thousand times\u003C\u002Fli>\u003Cli>Contains copyrighted music, according to the report\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>3. Free Music Archive\u003C\u002Fh2>\u003Cp>The \u003Ca href=\"https:\u002F\u002Ffreemusicarchive.org\u002F\">Free Music Archive\u003C\u002Fa> is the clearest example of a dataset that began as a research resource and later became useful for AI training. It was published by academic researchers in 2017 for music-information-retrieval work, the kind of software research that focuses on searching, sorting, and analyzing music.\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781930865861-f4l8.png\" alt=\"Four music datasets are shaping AI music training\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>The archive draws on a catalog directed by \u003Ca href=\"https:\u002F\u002Fwfmu.org\u002F\">WFMU\u003C\u002Fa>, a freeform U.S. radio station whose artists had already released tracks under permissive Creative Commons licenses. That licensing history matters because the material was openly shared long before generative AI systems began training on it.\u003C\u002Fp>\u003Cul>\u003Cli>100,000-plus tracks\u003C\u002Fli>\u003Cli>Academic release in 2017\u003C\u002Fli>\u003Cli>Built from Creative Commons-licensed music\u003C\u002Fli>\u003Cli>Used by Google and Stability AI, according to The Atlantic\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>4. The other 100,000-plus dataset\u003C\u002Fh2>\u003Cp>The fourth collection is another dataset with roughly 100,000 tracks, but the report does not name it in the excerpted text. Even so, it helps show the range of sources AI developers have been drawing from: some datasets are openly documented, while others are far harder to audit.\u003C\u002Fp>\u003Cp>That gap between public documentation and actual usage is why the legal dispute keeps widening. \u003Ca href=\"https:\u002F\u002Fwww.theatlantic.com\u002F\">The Atlantic\u003C\u002Fa> report notes that all four datasets have been downloaded several thousand times, yet the industry still keeps much of its training data hidden.\u003C\u002Fp>\u003Cul>\u003Cli>100,000-plus tracks\u003C\u002Fli>\u003Cli>Unnamed in the report excerpt\u003C\u002Fli>\u003Cli>Downloaded several thousand times\u003C\u002Fli>\u003Cli>Illustrates the audit problem in AI music training\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>5. What the lawsuits and licenses mean\u003C\u002Fh2>\u003Cp>The datasets matter because they sit inside a larger legal shift. \u003Ca href=\"https:\u002F\u002Fwww.udio.com\u002F\">Udio\u003C\u002Fa> and \u003Ca href=\"https:\u002F\u002Fsuno.com\u002F\">Suno\u003C\u002Fa> are facing at least 12 lawsuits, while major rightsholders have started moving from pure litigation to licensing. Universal Music Group settled with Udio in October 2025, and Warner Music Group followed with its own Udio deal and then a first-of-its-kind partnership with Suno.\u003C\u002Fp>\u003Cp>Those agreements point to a future where some AI music tools may operate inside licensed systems rather than open training pipelines. At the same time, \u003Ca href=\"https:\u002F\u002Fwww.sonymusic.com\u002F\">Sony Music\u003C\u002Fa> remains in court, and independent artists and groups such as the American Federation of Musicians are still pressing claims over uncredited or uncompensated use.\u003C\u002Fp>\u003Cul>\u003Cli>UMG settled with Udio in October 2025\u003C\u002Fli>\u003Cli>Warner settled with Udio in November 2025\u003C\u002Fli>\u003Cli>Warner then settled with Suno\u003C\u002Fli>\u003Cli>Sony Music remains in active litigation\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>How to decide\u003C\u002Fh2>\u003Cp>If you care most about scale, LAION-DISCO-12M is the headline dataset. If you care about provenance, the Free Music Archive is the clearest case of a research dataset with known licensing roots. If you care about where the market is heading, the Udio and Suno settlements matter more than any single dataset because they show the industry moving toward licensed AI music systems.\u003C\u002Fp>\u003Cp>For readers tracking risk, the main signal is not just how many tracks sit in a dataset, but whether artists, labels, and platforms can see how that data was gathered and used. That transparency gap is now the core issue.\u003C\u002Fp>","4 music datasets with 21M+ tracks are circulating in AI training, and the legal fight is now moving toward licensing deals.","www.musicbusinessworldwide.com","https:\u002F\u002Fwww.musicbusinessworldwide.com\u002Ffour-music-datasets-holding-millions-of-tracks-are-being-shared-among-ai-developers-the-atlantic-reports\u002F",null,"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781930863975-psfj.png","industry","en","d81055cb-e6f4-4deb-862b-8be06436e913",[17,18,19,20,21,22,23,24],"music datasets","AI music training","LAION-DISCO-12M","Free Music Archive","Suno","Udio","copyright","music licensing",[26,27,28],"Four datasets with 21 million-plus recordings are circulating among AI developers.","LAION-DISCO-12M is the largest publicly documented collection, with 12 million-plus tracks.","The legal fight is shifting from lawsuits toward licensing deals and walled-garden AI products.",0,"2026-06-20T04:47:22.638469+00:00","2026-06-20T04:47:22.632+00:00","6d32add7-839d-45c4-9e9d-d415af3a421d",{"tags":34,"relatedLang":35,"relatedPosts":39},[],{"id":15,"slug":36,"title":37,"language":38},"four-music-datasets-ai-music-training-zh","4 個音樂資料集正在改寫 AI 訓練","zh",[40,46,52,58,64,70],{"id":41,"slug":42,"title":43,"cover_image":44,"image_url":44,"created_at":45,"category":13},"2c317df8-4070-4c74-bab5-48f79fe2860e","claude-vs-gpt-vs-gemini-coding-benchmark-leaderboard-en","Claude vs GPT vs Gemini: Coding Benchmark Leaderboard","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781939876788-ivgw.png","2026-06-20T07:17:35.473285+00:00",{"id":47,"slug":48,"title":49,"cover_image":50,"image_url":50,"created_at":51,"category":13},"e018e62b-a712-4e2c-aee6-21fb492b993a","clip-converter-rivals-faster-safer-2026-en","Clip Converter’s 2026 rivals are faster and safer","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781935360953-hmtl.png","2026-06-20T06:02:19.249403+00:00",{"id":53,"slug":54,"title":55,"cover_image":56,"image_url":56,"created_at":57,"category":13},"e5877eb6-413d-46f3-b91e-3c4139b5e1f9","openai-sora-shutdown-unit-economics-en","OpenAI’s Sora shutdown proves hype can’t outrun unit economics","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781933563230-w87h.png","2026-06-20T05:32:17.714689+00:00",{"id":59,"slug":60,"title":61,"cover_image":62,"image_url":62,"created_at":63,"category":13},"d2810cd9-a360-4466-a3a3-5a953daea1b1","anthropics-model-shutdown-safety-can-bite-back-en","Anthropic’s model shutdown shows safety can bite back","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781932667519-4tfs.png","2026-06-20T05:17:22.132183+00:00",{"id":65,"slug":66,"title":67,"cover_image":68,"image_url":68,"created_at":69,"category":13},"58f4c299-67b3-4f19-bc1d-7bb6ef0db0f2","boy-george-ai-vs-taylor-swift-rerecordings-en","Boy George AI vs Taylor Swift rerecordings","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781931778008-edt4.png","2026-06-20T05:02:33.338344+00:00",{"id":71,"slug":72,"title":73,"cover_image":74,"image_url":74,"created_at":75,"category":13},"27db8279-3710-431a-a86b-20ca47af3a15","deezer-ai-music-detector-playlists-transparency-en","Deezer is right to expose AI music in playlists","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1781929965653-f5r4.png","2026-06-20T04:32:18.064695+00:00",[77,82,87,92,97,102,107,112,117,122],{"id":78,"slug":79,"title":80,"created_at":81},"d35a1bd9-e709-412e-a2df-392df1dc572a","ai-impact-2026-developments-market-en","AI's Impact in 2026: Key Developments and Market Shifts","2026-03-25T16:20:33.205823+00:00",{"id":83,"slug":84,"title":85,"created_at":86},"5ed27921-5fd6-492e-8c59-78393bf37710","trumps-ai-legislative-framework-en","Trump's AI Legislative Framework: What's Inside?","2026-03-25T16:22:20.005325+00:00",{"id":88,"slug":89,"title":90,"created_at":91},"e454a642-f03c-4794-b185-5f651aebbaca","nvidia-gtc-2026-key-highlights-innovations-en","NVIDIA GTC 2026: Key Highlights and Innovations","2026-03-25T16:22:47.882615+00:00",{"id":93,"slug":94,"title":95,"created_at":96},"0ebb5b16-774a-4922-945d-5f2ce1df5a6d","claude-usage-diversifies-learning-curves-en","Claude Usage Diversifies, Learning Curves Emerge","2026-03-25T16:25:50.770376+00:00",{"id":98,"slug":99,"title":100,"created_at":101},"69934e86-2fc5-4280-8223-7b917a48ace8","openclaw-ai-commoditization-concerns-en","OpenClaw's Rise Raises Concerns of AI Model Commoditization","2026-03-25T16:26:30.582047+00:00",{"id":103,"slug":104,"title":105,"created_at":106},"b4b2575b-2ac8-46b2-b90e-ab1d7c060797","google-gemini-ai-rollout-2026-en","Google's Gemini AI Rollout Extended to 2026","2026-03-25T16:28:14.808842+00:00",{"id":108,"slug":109,"title":110,"created_at":111},"6e18bc65-42ae-4ad0-b564-67d7f66b979e","meta-llama4-fabricated-results-scandal-en","Meta's Llama 4 Scandal: Fabricated AI Test Results Unveiled","2026-03-25T16:29:15.482836+00:00",{"id":113,"slug":114,"title":115,"created_at":116},"bf888e9d-08be-4f47-996c-7b24b5ab3500","accenture-mistral-ai-deployment-en","Accenture and Mistral AI Team Up for AI Deployment","2026-03-25T16:31:01.894655+00:00",{"id":118,"slug":119,"title":120,"created_at":121},"5382b536-fad2-49c6-ac85-9eb2bae49f35","mistral-ai-high-stakes-2026-en","Mistral AI: Facing High Stakes in 2026","2026-03-25T16:31:39.941974+00:00",{"id":123,"slug":124,"title":125,"created_at":126},"9da3d2d6-b669-4971-ba1d-17fdb3548ed5","cursors-meteoric-rise-pressures-en","Cursor's Meteoric Rise Faces Industry Pressures","2026-03-25T16:32:21.899217+00:00"]