jpmdb

a personalized movie database for my friend Juan

Cleaning Process

The original source data was a .txt file containing a list of movies/tv shows, the order they were watched that year, and a rating out of 10
The .txt file was parsed in create_silver_jpmdb.py, including parsing the ratings, seasons, watch order, year specifiers and other metadata
Downloaded imdb data from IMDb Datasets and converted the .gz files into silver/imdb/title_basics and silver/imdb/title_ratings using create_silver_imdb.py
The jpmdb and imdb datasets were initially joined using standard string cleaning and fuzzy matching approaches into stg_jpmdb_combined using create_silver_stg_jpmdb_combined.py
Entries were manually reviewed a small CLI tool review_combined_jpmdb.py, giving an opportunity to correct fuzzy matching errors and manually add missing entries
After all entries were validated, the data was moved to the gold table gold/jpmdb in create_gold_jpmdb.py

The dashboard is built using Dash and Plotly. It currently includes 4 visualizations:

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
docs		docs
jpmdb		jpmdb
tests		tests
.gitignore		.gitignore
.python-version		.python-version
Dockerfile-frontend		Dockerfile-frontend
README.md		README.md
docker-compose.yml		docker-compose.yml
gunicorn_config.py		gunicorn_config.py
hello.py		hello.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock