Metadata from 256 million songs is currently being shared through torrents, and accessing streaming music without payment may soon be simpler than ever. Libraries remain a valuable source for physical media, whether books, audio CDs, DVDs, or other content formats. However, one aspect libraries have yet to consistently provide to the wider community is digital content. While many digital libraries exist online, issues regarding piracy and fair compensation for media rights holders complicate this experience. This is a challenge that Anna’s Archive, self-described as “the largest truly open library in human history,” is endeavoring to address.
In a truly remarkable development, Anna’s Archive announced that it has backed up nearly all of the music available on Spotify. The blog post from December 20 reveals that Anna’s Archive “found a method to scrape Spotify at scale,” and the team “saw an opportunity to build a music archive primarily focused on preservation.” The data backup includes 86 million music files, which Anna’s Archive claims represent 99.6% of Spotify listens.
Anna’s Archive was launched in 2022 as a torrent aggregator and search engine for primarily text-based files available on LibGen, Sci-Hub, Z-Lib, and other torrents. It arose as a response to U.S. government efforts to shut down Z-Library. The organization, comprised of anonymous contributors, now aims to establish something similar for music.
Unsurprisingly, Spotify is displeased with the reports of Anna’s Archive scraping its music files, criticizing the use of “illicit tactics” to circumvent DRM and copyright protections. Numerous questions linger, such as whether Spotify or regulators will seek legal actions against Anna’s Archive, or if a “free” database of Spotify songs will be made accessible to the general listener.
This vast music backup, unprecedented in scale, will compel companies, regulators, and users to confront the following question: what distinguishes preservation from piracy?
What Anna’s Archive has successfully backed up
Anna’s Archive indicated it chose to back up Spotify tracks based on the company’s own popularity metric. There are many songs on Spotify that receive hardly any listens. As a reference point, the archive estimates that the top three songs on Spotify received more streams than the combined total of the bottom 20 to 100 million songs. Overall, the backup includes metadata from 256 million tracks and audio files for 86 million songs.
Spotify defines its popularity metric as “a value ranging from 0 to 100, with 100 being the most popular.” This metric is computed by an algorithm that relies primarily on the total number of plays a track has received and how recent those plays are.
Based on this categorization, Anna’s Archive backed up the 86 million most popular songs, representing 37% of Spotify’s entire catalog. Nevertheless, it accounts for 99.6% of listens. Simply put, even though the archive has backed up less than half of Spotify’s songs, it encompasses almost all of the tracks listeners actually engage with.
While Anna’s Archive secured metadata for 99.9% of tracks, making it the largest music metadata archive globally, it limited itself to backing up only 37% of Spotify music files due to storage limitations. The 86 million archived songs require 300TB of storage, while archiving the remainder would have needed an additional 700TB of storage “for minor advantage,” according to the blog post.
The music files are stored in OGG Vorbis format at 160kbps for songs with a popularity metric above zero. Songs rated with a popularity of zero were re-encoded in OGG Vorbis at 75kbps. Anna’s Archive included metadata in the audio files, such as “title, url, ISRC, UPC, album art, and replaygain information.” Typically, audio files contain no inherent metadata, making this addition significant.
Spotify states this is merely scraping using ‘illicit tactics’
It is essential to note that the backup by Anna’s Archive is illegal for multiple reasons. Scraping Spotify’s databases breaches the company’s terms of service, while the removal of digital rights management (DRM) features and sharing copyrighted material violate copyright law. By definition, the music backup from Anna’s Archive constitutes piracy.
Spotify appears to concur, as it issued statements to both Android Authority and Ars Technica commenting on the Anna’s Archive release.
“An investigation into unauthorized access revealed that a third party scraped public metadata and employed illicit tactics to bypass DRM and access some of the platform’s audio files,” Spotify informed Android Authority. “We are actively investigating the situation.”
Importantly, Spotify does not verify the extent of the Anna’s Archive backup, only asserting that “some” of the site’s audio files were accessed. In a separate statement, Spotify mentioned it is taking steps to prevent similar incidents from occurring in the future.
“We’ve put new safeguards in place against these types of anti-copyright attacks and are diligently monitoring for suspicious behavior.”
Read More