This repo demonstrates how to fetch data from an S3 Bucket for training AI/ML Models.
First, we need to create a virtual environment for Python, which essentially installs your packages to one project instead of globally
- Create a virtual environment by running the following in the terminal directory of your project
- Windows: 
py -m venv venv - Mac/Linux: 
python3 -m venv venv 
 - Windows: 
 - Then, you need to actually use the environment, so type the following
- Windows: 
venv\Scripts\Activate.ps1 - Mac/Linux: 
source ./venv/bin/activate 
 - Windows: 
 - Your terminal should have (venv) at the beginning of it now
- I would recommend setting your VSCode interpreter to the venv path in the bottom right when a Python file is open
 
 - I have made a requirements.txt to install all of the packages you will need, so run the following to install them
pip install -r requirements.txt
 
In order to not push your keys to GitHub, we need to put them in a .env file
- Open that file and replace 
YOUR_ACCESS_KEYandYOUR_SECRET_KEYwith the ones you copied earlier - Go to the 
.gitignorefile and uncomment (remove the #) from line 4 that says.env- A 
.gitignorefile tells git to not push certain files and keep them local to computer - Remember, your AWS credentials are EXTREMELY CONFIDENTIAL, hence why we do this
 
 - A 
 - Now you are ready to call AWS services!
 
The INRIX data is in an S3 bucket hosted on AWS via an ACM AWS account, so this can easily be used as a dataset for SageMaker or Bedrock all through AWS!
- If you need the files however, 
s3access.pyands3download.pyare useful for you - The bucket name should've been shared with you (WHICH IS SENSITIVE INFO)
- Ask the #help channel if you don't have it
 
 - Simply run the files that you need
- s3access to see the files
 - s3downlaod to download a specific file
 
 
This dataset contains images of traffic for 24 hours, every 5 minutes, on 24 cameras in Seattle. This is great for training an AI model on detection of many different things regarding traffic!