This repo has the complete materials of the tutorial session Improving Open Data Quality using Python, presented at PyData Global 2023 conference
First, we should create a python virtual environment and install the required dependencies. To do so, we can run the following commands:
python -m venv data-quality
Now depending on your OS, you should run the following command:
- Linux/MacOS
source data-quality/bin/activate
- Windows
data-quality\Scripts\Activate.ps1
Finally, we can install the required dependencies:
pip install -r requirements.txt
You can also launch the notebook in Google Colab performing the following steps:
- Open the Colab web site: https://colab.research.google.com/
- File menu -> Open notebook
- Click on the GitHub tab
- Paste the following URL: https://github.com/elsatch/yData-Global-2023-Improving-Open-Data-Quality-using-Python.git
- Select the
single_datasets.ipynb
notebook - Execute the specific cell for colab at the beginning of the notebook