Spotify says ‘anti-copyright extremists’ scraped its library

Update: Spotify has provided a second statement in relation to this story.
“Spotify has identified and disabled the nefarious user accounts that engaged in unlawful scraping. We’ve implemented new safeguards for these types of anti-copyright attacks and are actively monitoring for suspicious behaviour,” said its spokesperson.
“Since day one, we have stood with the artist community against piracy, and we are actively working with our industry partners to protect creators and defend their rights.”
The original story follows below:
This is not the Christmas present that Spotify would have been hoping for: a huge chunk of its metadata being released on filesharing networks.
Billboard reported that a ‘pirate activist group’ accessed 256m rows of track metadata and 86m audio files, although so far only the metadata has been released via Anna’s Archive – a search engine for ‘shadow libraries’ that has previously focused more on books.
“We backed up Spotify (metadata and music files). It’s distributed in bulk torrents (~300TB), grouped by popularity,” claimed a blog post from the site. “This release includes the largest publicly available music metadata database with 256 million tracks and 186 million unique ISRCs.
“It’s the world’s first ‘preservation archive’ for music which is fully open (meaning it can easily be mirrored by anyone with enough disk space), with 86 million music files, representing around 99.6% of listens.”
In theory, anyone could use this archive to build their own Spotify clone. In practice, the response from rightsholders to any such effort would be swift and significant. Remember, they sued (and settled with) The Internet Archive – and that was over a preservation archive consisting solely of old 78rpm records.
“An investigation into unauthorised access identified that a third party scraped public metadata and used illicit tactics to circumvent DRM to access some of the platform’s audio files. We are actively investigating the incident,” Spotify’s spokesperson told Music Ally.
They also described the activists as “anti-copyright extremists who’ve previously pirated content from YouTube and other platforms”.
Spotify’s investigations continue, including into what exactly has been accessed by the activists, and how. This isn’t a leak or a security breach with implications for users, note. The activists are currently thought to have used Spotify’s public web API to scrape the metadata.
The YouTube comparison made by Spotify is important, though. Datasets from that service have been circulating for some time too, and are thought to be one of the sources that unlicensed GenAI-music services have used to train their models.
So, that might be the more worrying thing here for the music industry. Not so much people building their own free Spotify clones, but more the potential for the dataset and audio to undermine licensing efforts with AI firms – even if those that want to could already get this kind of music content in other ways.




