-
Notifications
You must be signed in to change notification settings - Fork 147
AP IB GT enrollment #1774
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
HarishC727
merged 12 commits into
datacommonsorg:master
from
HarishC727:apibgt_enrollment
Dec 26, 2025
Merged
AP IB GT enrollment #1774
Changes from 2 commits
Commits
Show all changes
12 commits
Select commit
Hold shift + click to select a range
a696bc7
ap ib gt enrollment
HarishC727 b5d7ae5
Merge branch 'master' into apibgt_enrollment
HarishC727 89732f9
resolved gemini comments
HarishC727 8435c3a
resolved gemini comments
HarishC727 9acd46a
Merge branch 'master' into apibgt_enrollment
HarishC727 488bcab
fixed ai review comments
HarishC727 36f07b6
Merge branch 'master' into apibgt_enrollment
HarishC727 05900d7
resolved core comments
HarishC727 a1ab534
Merge branch 'master' into apibgt_enrollment
HarishC727 2179cf8
resolved core comments
HarishC727 02f0ea5
fixed cron schedule
HarishC727 fc16e7d
Merge branch 'master' into apibgt_enrollment
HarishC727 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
124 changes: 124 additions & 0 deletions
124
statvar_imports/us_urban_school/ap_ib_gt_enrollment/README.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,124 @@ | ||
| # AP, IB, and GT Enrollment Data Downloader | ||
|
|
||
| This script downloads and processes Advanced Placement (AP), International Baccalaureate (IB), and Gifted and Talented (GT) enrollment data from the Civil Rights Data Collection (CRDC). | ||
|
|
||
| ## Scripts | ||
|
|
||
| There are two python scripts available: | ||
|
|
||
| * `download_ap_ib_gt.py`: This is the main script that downloads and processes AP, IB, and GT enrollment data for multiple years. | ||
| * `download_2015_16.py`: This script is specifically for downloading and processing the 2015-16 data, which has a different structure. | ||
|
|
||
| ## Usage | ||
|
|
||
| To run the scripts, execute the following commands from the `data` directory: | ||
|
|
||
| ### Main Script | ||
|
|
||
| ```bash | ||
| python3 statvar_imports/us_urban_school/ap_ib_gt_enrollment/download_ap_ib_gt.py | ||
| ``` | ||
|
|
||
| You can also specify which data to download by using the following flags: | ||
|
|
||
| * `--ap`: Download Advanced Placement data only. | ||
| * `--ib`: Download International Baccalaureate data only. | ||
| * `--gt`: Download Gifted and Talented data only. | ||
|
|
||
| For example, to download only the Advanced Placement data, run the following command: | ||
|
|
||
| ```bash | ||
| python3 statvar_imports/us_urban_school/ap_ib_gt_enrollment/download_ap_ib_gt.py --ap | ||
| ``` | ||
|
|
||
| ### 2015-16 Script | ||
|
|
||
| ```bash | ||
| python3 statvar_imports/us_urban_school/ap_ib_gt_enrollment/download_2015_16.py | ||
| ``` | ||
|
|
||
| You can also specify which data to download by using the following flags: | ||
|
|
||
| * `--ap`: Download Advanced Placement data for 2015-16 only. | ||
| * `--ib`: Download International Baccalaureate data for 2015-16 only. | ||
| * `--gt`: Download Gifted and Talented data for 2015-16 only. | ||
|
|
||
| For example, to download only the Advanced Placement data for 2015-16, run the following command: | ||
|
|
||
| ```bash | ||
| python3 statvar_imports/us_urban_school/ap_ib_gt_enrollment/download_2015_16.py --ap | ||
| ``` | ||
|
|
||
| ## Output Files | ||
|
|
||
| The scripts will download the data into the following directories: | ||
|
|
||
| * `statvar_imports/us_urban_school/ap_ib_gt_enrollment/advanced_placements/input_files` | ||
| * `statvar_imports/us_urban_school/ap_ib_gt_enrollment/international_baccalaureate/input_files` | ||
| * `statvar_imports/us_urban_school/ap_ib_gt_enrollment/gifted_and_talented/input_files` | ||
|
|
||
| The output files will be in CSV or XLSX format. | ||
|
|
||
| ## Data Source | ||
|
|
||
| The data is downloaded from the Civil Rights Data Collection (CRDC). | ||
|
|
||
| ## Processing Steps | ||
|
|
||
|
|
||
|
|
||
| The scripts perform the following processing steps: | ||
|
|
||
|
|
||
|
|
||
| 1. Download the data from the CRDC website. | ||
|
|
||
| 2. Extract the relevant file from the downloaded zip file. | ||
|
|
||
| 3. Add a 'YEAR' and 'ncesid' column to the data. | ||
|
|
||
| 4. Save the processed data as a CSV or XLSX file. | ||
|
|
||
|
|
||
|
|
||
|
HarishC727 marked this conversation as resolved.
Outdated
|
||
| ## Processing the downloaded data | ||
|
|
||
|
|
||
|
|
||
| After downloading the data, you can process it by running the `run_process.sh` script in each of the data directories. | ||
|
|
||
|
|
||
|
|
||
| For example, to process the Advanced Placement data, run the following command: | ||
|
|
||
|
|
||
|
|
||
| ```bash | ||
|
|
||
| bash statvar_imports/us_urban_school/ap_ib_gt_enrollment/advanced_placements/run_process.sh | ||
|
|
||
| ``` | ||
|
|
||
|
HarishC727 marked this conversation as resolved.
|
||
|
|
||
|
|
||
| You can also download and process the data in one step by using the `--download` flag: | ||
|
|
||
|
|
||
|
|
||
| ```bash | ||
|
|
||
| bash statvar_imports/us_urban_school/ap_ib_gt_enrollment/advanced_placements/run_process.sh --download | ||
|
HarishC727 marked this conversation as resolved.
Outdated
|
||
|
|
||
| ``` | ||
|
|
||
|
|
||
|
|
||
| The processing script will: | ||
|
|
||
|
|
||
|
|
||
| 1. Create an `output_files` directory. | ||
|
|
||
| 2. Process the downloaded data for each year. | ||
|
|
||
| 3. Generate statistical variables using the `stat_var_processor.py` script. | ||
6 changes: 6 additions & 0 deletions
6
statvar_imports/us_urban_school/ap_ib_gt_enrollment/advanced_placements/common_output.tmcf
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,6 @@ | ||
| Node: E:common->E0 | ||
| observationDate: C:common_output->observationDate | ||
| observationAbout: C:common_output->observationAbout | ||
| variableMeasured: C:common_output->variableMeasured | ||
| value: C:common_output->value | ||
| typeOf: dcs:StatVarObservation |
31 changes: 31 additions & 0 deletions
31
statvar_imports/us_urban_school/ap_ib_gt_enrollment/advanced_placements/manifest.json
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,31 @@ | ||
| { | ||
| "import_specifications": [ | ||
| { | ||
| "import_name": "US_AP_ENROLLMENT", | ||
| "curator_emails": [ | ||
| "support@datacommons.org" | ||
| ], | ||
| "provenance_url": "https://civilrightsdata.ed.gov/data", | ||
| "provenance_description": "This dataset contains enrollment in Advanced Placement (AP) for each school by students' race and sex, Limited English Proficiency status and sex, and disability status and sex.", | ||
| "scripts": [ | ||
| "run_process.sh" | ||
| ], | ||
| "source_files": [ | ||
| "input_files/*.xlsx", | ||
| "input_files/*.csv" | ||
| ], | ||
| "import_inputs": [ | ||
| { | ||
| "template_mcf": "common_output.tmcf", | ||
| "cleaned_csv": "output_files/output_*.csv" | ||
| } | ||
| ], | ||
| "cron_schedule": "0 05 8,23 * *", | ||
| "resource_limits": { | ||
| "cpu": 16, | ||
| "memory": 256, | ||
| "disk": 500 | ||
| } | ||
| } | ||
| ] | ||
| } |
90 changes: 90 additions & 0 deletions
90
statvar_imports/us_urban_school/ap_ib_gt_enrollment/advanced_placements/run_process.sh
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,90 @@ | ||
| #!/bin/bash | ||
|
|
||
| # Exit immediately if a command exits with a non-zero status. | ||
| set -e | ||
|
|
||
| # Navigate to the script's directory to ensure relative paths work correctly. | ||
| SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )" | ||
| cd "$SCRIPT_DIR" | ||
|
|
||
| # Default to not downloading | ||
| DOWNLOAD=false | ||
|
|
||
| # Parse command line arguments | ||
| while [[ "$#" -gt 0 ]]; do | ||
| case $1 in | ||
| --download) DOWNLOAD=true ;; | ||
| *) echo "Unknown parameter passed: $1"; exit 1 ;; | ||
| esac | ||
| shift | ||
| done | ||
|
|
||
| # Function to process each downloaded data file. | ||
| process_files() { | ||
| # Create the output directory if it doesn't exist. | ||
| mkdir -p output_files | ||
|
|
||
| declare -A processed_years | ||
|
|
||
| # Loop through all AP Enrollment files in the input directory to identify unique years | ||
| for input_file in input_files/*_AP_Enrollment.*; do | ||
| # Check if any file exists to avoid errors when no files are found. | ||
| [ -e "$input_file" ] || continue | ||
|
|
||
| filename=$(basename "$input_file") | ||
| year=$(echo "$filename" | cut -d'_' -f1) | ||
| extension="${filename##*.}" | ||
|
|
||
| # Determine expected extension based on year | ||
| expected_extension="" | ||
| if [[ "$year" == "2010" || "$year" == "2012" || "$year" == "2014" ]]; then | ||
| expected_extension="xlsx" | ||
| else | ||
| expected_extension="csv" | ||
| fi | ||
|
|
||
| # Skip if the extension does not match the expected one | ||
| if [[ "$extension" != "$expected_extension" ]]; then | ||
| echo "Skipping $input_file: Expected .$expected_extension, but found .$extension." | ||
| continue | ||
| fi | ||
|
|
||
| # If this year hasn't been processed yet, process it | ||
| if [[ -z ${processed_years[$year]} ]]; then | ||
| processed_years[$year]=true | ||
| echo "Processing year: $year" | ||
|
|
||
| # Define the glob pattern for input files for this year | ||
| input_data_glob="input_files/${year}_AP_Enrollment_*.${expected_extension}" | ||
|
|
||
| # Define the output path based on the year. | ||
| output_path="output_files/output_${year}_ap" | ||
|
|
||
| # Construct the command for the current year | ||
| CMD="python3 ../../../tools/statvar_importer/stat_var_processor.py \ | ||
| --input_data=${input_data_glob} \ | ||
| --pv_map=../config/ap_enrollment_pvmap.csv \ | ||
| --config_file=../config/common_metadata.csv \ | ||
| --output_path=${output_path} \ | ||
| --existing_statvar_mcf=gs://unresolved_mcf/scripts/statvar/stat_vars.mcf" | ||
|
|
||
| # Print and execute the command. | ||
| echo "Executing command for year ${year}:" | ||
| echo "$CMD" | ||
| eval "$CMD" | ||
|
HarishC727 marked this conversation as resolved.
Outdated
|
||
| echo "--- Finished processing for year ${year} ---" | ||
| fi | ||
| done | ||
| } | ||
|
|
||
| if [ "$DOWNLOAD" = true ]; then | ||
| echo "--- Starting download of AP data ---" | ||
| python3 ../download_ap_ib_gt.py --ap | ||
| python3 ../download_2015_16.py --ap | ||
| echo "--- Download complete ---" | ||
| fi | ||
|
|
||
| echo "--- Starting processing of files ---" | ||
| process_files | ||
| echo "--- All processing complete ---" | ||
|
|
||
10 changes: 10 additions & 0 deletions
10
...mports/us_urban_school/ap_ib_gt_enrollment/advanced_placements/test_data/sample_input.csv
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| YEAR,ncesid,JJ,LEA_STATE,LEA_STATE_NAME,LEAID,LEA_NAME,SCHID,SCH_NAME,COMBOKEY,SCH_APSCIENR_IND,SCH_APSCIENR_HI_M,SCH_APSCIENR_HI_F,SCH_APSCIENR_AM_M,SCH_APSCIENR_AM_F,SCH_APSCIENR_AS_M,SCH_APSCIENR_AS_F,SCH_APSCIENR_HP_M,SCH_APSCIENR_HP_F,SCH_APSCIENR_BL_M,SCH_APSCIENR_BL_F,SCH_APSCIENR_WH_M,SCH_APSCIENR_WH_F,SCH_APSCIENR_TR_M,SCH_APSCIENR_TR_F,TOT_APSCIENR_M,TOT_APSCIENR_F,SCH_APSCIENR_EL_M,SCH_APSCIENR_EL_F,SCH_APSCIENR_IDEA_M,SCH_APSCIENR_IDEA_F,SCH_APCOMPENR_IND,SCH_APCOMPENR_HI_M,SCH_APCOMPENR_HI_F,SCH_APCOMPENR_AM_M,SCH_APCOMPENR_AM_F,SCH_APCOMPENR_AS_M,SCH_APCOMPENR_AS_F,SCH_APCOMPENR_HP_M,SCH_APCOMPENR_HP_F,SCH_APCOMPENR_BL_M,SCH_APCOMPENR_BL_F,SCH_APCOMPENR_WH_M,SCH_APCOMPENR_WH_F,SCH_APCOMPENR_TR_M,SCH_APCOMPENR_TR_F,TOT_APCOMPENR_M,TOT_APCOMPENR_F,SCH_APCOMPENR_EL_M,SCH_APCOMPENR_EL_F,SCH_APCOMPENR_IDEA_M,SCH_APCOMPENR_IDEA_F,TOT_APOTHENR_M,TOT_APOTHENR_F,SCH_APENR_IND,SCH_APCOURSES,SCH_APSEL,SCH_APENR_HI_M,SCH_APENR_HI_F,SCH_APENR_AM_M,SCH_APENR_AM_F,SCH_APENR_AS_M,SCH_APENR_AS_F,SCH_APENR_HP_M,SCH_APENR_HP_F,SCH_APENR_BL_M,SCH_APENR_BL_F,SCH_APENR_WH_M,SCH_APENR_WH_F,SCH_APENR_TR_M,SCH_APENR_TR_F,TOT_APENR_M,TOT_APENR_F,SCH_APENR_EL_M,SCH_APENR_EL_F,SCH_APENR_IDEA_M,SCH_APENR_IDEA_F,SCH_APENR_504_M,SCH_APENR_504_F,SCH_APMATHENR_IND,SCH_APMATHENR_HI_M,SCH_APMATHENR_HI_F,SCH_APMATHENR_AM_M,SCH_APMATHENR_AM_F,SCH_APMATHENR_AS_M,SCH_APMATHENR_AS_F,SCH_APMATHENR_HP_M,SCH_APMATHENR_HP_F,SCH_APMATHENR_BL_M,SCH_APMATHENR_BL_F,SCH_APMATHENR_WH_M,SCH_APMATHENR_WH_F,SCH_APMATHENR_TR_M,SCH_APMATHENR_TR_F,TOT_APMATHENR_M,TOT_APMATHENR_F,SCH_APMATHENR_EL_M,SCH_APMATHENR_EL_F,SCH_APMATHENR_IDEA_M,SCH_APMATHENR_IDEA_F | ||
| 2022,010000299995,Yes,AL,ALABAMA,0100002,Alabama Youth Services,99995,AUTAUGA CAMPUS,010000299995,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9 | ||
| 2022,010000500870,No,AL,ALABAMA,0100005,Albertville City,00870,Albertville Middle School,010000500870,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9 | ||
| 2022,010000500871,No,AL,ALABAMA,0100005,Albertville City,00871,Albertville High School,010000500871,Yes,12,13,0,0,2,0,0,0,1,3,14,13,2,1,31,30,0,0,0,0,Yes,2,4,0,0,0,0,0,0,1,0,11,5,0,0,14,9,0,0,0,0,66,86,Yes,10,Yes,27,45,0,0,2,0,0,0,3,7,48,57,2,3,82,112,0,1,0,0,0,0,Yes,9,16,0,0,2,0,0,0,1,2,12,19,1,0,25,37,0,0,0,0 | ||
| 2022,010000500879,No,AL,ALABAMA,0100005,Albertville City,00879,Albertville Intermediate School,010000500879,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9 | ||
| 2022,010000500889,No,AL,ALABAMA,0100005,Albertville City,00889,Albertville Elementary School,010000500889,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9 | ||
| 2022,010000501616,No,AL,ALABAMA,0100005,Albertville City,01616,Albertville Kindergarten and PreK,010000501616,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9 | ||
| 2022,010000502150,No,AL,ALABAMA,0100005,Albertville City,02150,Albertville Primary School,010000502150,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9 | ||
| 2022,010000600193,No,AL,ALABAMA,0100006,Marshall County,00193,Kate Duncan Smith DAR Middle,010000600193,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9 | ||
| 2022,010000600872,No,AL,ALABAMA,0100006,Marshall County,00872,Asbury High School,010000600872,No,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,Yes,0,0,0,0,1,0,0,0,0,1,3,4,0,0,4,5,1,0,0,0,0,0,Yes,3,Yes,0,0,0,0,1,0,0,1,0,1,23,12,0,0,24,14,1,0,0,0,0,0,No,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9 |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.