Skip to content

A website to generate detailed statistics from your entire spotify streaming history.

License

Notifications You must be signed in to change notification settings

piebro/spotify-statistics

Repository files navigation

Personal Spotify Statistics

I like statistics and listening to music, and I also enjoy the yearly Spotify Wrapped. After watching one at the end of each year, I am always interested in more in-depth statistics over multiple years. That's why I created a website to generate personal Spotify usage statistics.

This works well because, a few years ago, the EU passed the GDPR Act, which enables EU citizens to access all personal data a company has stored about you. Spotify stores a log of your listening history, including partial listens. This data log is a treasure if you are interested in your own listening behavior. You can request your Extended streaming history data at https://www.spotify.com/us/account/privacy/.

The data is processed with the data_crunching.py script to generate statistics for questions that I have about my listening behavior. The website makes it easy to use the script to generate your own personal statistics. The data is only processed in your browser and never leaves your computer.

Some thoughts about the data

I downloaded my own data several times over the past few months without any issues. However, the last time I downloaded it, there was some missing data for the year 2017. For this time period, the reason_end data field consistently showed none instead of the actual reason why the song ended. If you notice anything unusual in your statistics, it might be because the Spotify data export was inaccurate, and I would recommend re-downloading your data.

The data is divided into multiple JSON files, each approximately 10.5MB in size, containing the streaming history. This JSON is comprised of a lengthy list of objects, each representing a listening log. Additionally, there's a PDF file that offers explanations for each data field in various languages. As of October 2023, the table is structured as follows:

Technical Field Contains
ts This field is a timestamp indicating when the track stopped playing in UTC (Coordinated Universal Time). The order is year, month and day followed by a timestamp in military time.
username This field is your Spotify username.
platform This field is the platform used when streaming the track (e.g. Android OS, Google Chromecast).
ms_played This field is the number of milliseconds the stream was played.
conn_country This field is the country code of the country where the stream was played (e.g. SE - Sweden).
Ip_addr_decrypted This field contains the IP address logged when streaming the track.
user_agent_decrypted This field contains the user agent used when streaming the track (e.g. a browser, like Mozilla Firefox, or Safari)
master_metadata_track_name This field is the name of the track.
master_metadata_album_artist_name This field is the name of the artist, band or podcast.
master_metadata_album_album_name This field is the name of the album of the track.
spotify_track_uri A Spotify URI, uniquely identifying the track in the form of “spotify:track:”. A Spotify URI is a resource identifier that you can enter, for example, in the Spotify Desktop client’s search box to locate an artist, album, or track.
episode_name This field contains the name of the episode of the podcast.
episode_show_name This field contains the name of the show of the podcast.
spotify_episode_uri A Spotify Episode URI, uniquely identifying the podcast episode in the form of “spotify:episode:”. A Spotify Episode URI is a resource identifier that you can enter, for example, in the Spotify Desktop client’s search box to locate an episode of a podcast.
reason_start This field is a value telling why the track started (e.g. “trackdone”)
reason_end This field is a value telling why the track ended (e.g. “endplay”).
shuffle This field has the value True or False depending on if shuffle mode was used when playing the track.
skipped This field indicates if the user skipped to the next song.
offline This field indicates whether the track was played in offline mode (“True”) or not (“False”).
offline_timestamp This field is a timestamp of when offline mode was used, if used.
incognito_mode This field indicates whether the track was played in incognito mode (“True”) or not (“False”).

I perform some preprocessing to work with the data more easily.

I initially used the ts (timestamp) as my standard time reference, but I noticed that there were instances where multiple songs were logged at the exact same time. I suspect this might have occurred due to a lack of internet connection at those moments. That's why I looked at the offline_timestamp. The time in this field is saved as a Unix timestamp. However, some entries in this field don't make sense (they were less than 100 and would be from the 1970s). To address this, I utilize the offline_timestamp if it seems plausible; otherwise, I revert to using the normal ts for the timestamp.

I'm unsure how Spotify determined if a song was skipped. I delved into the ms_played and reason_end data for skipped songs but couldn't make sense of it. For these statistics, a song is considered skipped if the reason for the song ending is the pressing of the forward button. A song is marked as fully played when the reason for the end of the song is trackdone.

Contributing

Contributions to this project are welcome. Feel free to report bugs, suggest ideas or create merge requests.

Developing

# Create and activate a virtual environment for development (python>=3.8 ist needed)
python -m venv .venv
source .venv/bin/activate

# Install Python dependencies
pip3 install -r requirements.txt

# Use the script to create data
python data_crunching.py path-to-unzipped-spotify-data-folder

# Run a simple Python server to view your stats in the browser
python -m http.server 8000

# Open http://0.0.0.0:8000/ in your browser

The project uses the Python code formatter Black and the linter Ruff. Run black . and ruff . in the project directory before committing code. I also used Prettier for linting the index.js file with a print-width of 120, tab-width of 4, and using single quotes. Additionally, I used Stylelint for linting the index.css file.

Website Statistics

There is lightweight tracking for the website using Plausible. Anyone interested can view these statistics at https://plausible.io/piebro.github.io%2Fspotify-statistics. Note that only users without an AdBlocker are counted, so these statistics underestimate the actual number of visitors. I would assume that a significant number of people visiting the site, including myself, have an AdBlocker enabled.

License

This project is licensed under the MIT License - see the LICENSE file for details.