Skip to content
100 changes: 100 additions & 0 deletions statvar_imports/us_urban_school/ap_ib_gt_enrollment/README.md
Comment thread
HarishC727 marked this conversation as resolved.
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
# AP, IB, and GT Enrollment Data Downloader

This script downloads and processes Advanced Placement (AP), International Baccalaureate (IB), and Gifted and Talented (GT) enrollment data from the Civil Rights Data Collection (CRDC).

## Scripts

There are two python scripts available:

* `download_ap_ib_gt.py`: This is the main script that downloads and processes AP, IB, and GT enrollment data for multiple years.
* `download_2015_16.py`: This script is specifically for downloading and processing the 2015-16 data, which has a different structure.

## Usage

To run the scripts, execute the following commands from the `data` directory:

### Main Script

```bash
python3 statvar_imports/us_urban_school/ap_ib_gt_enrollment/download_ap_ib_gt.py
```

You can also specify which data to download by using the following flags:

* `--ap`: Download Advanced Placement data only.
* `--ib`: Download International Baccalaureate data only.
* `--gt`: Download Gifted and Talented data only.

For example, to download only the Advanced Placement data, run the following command:

```bash
python3 statvar_imports/us_urban_school/ap_ib_gt_enrollment/download_ap_ib_gt.py --ap
```

### 2015-16 Script

```bash
python3 statvar_imports/us_urban_school/ap_ib_gt_enrollment/download_2015_16.py
```

You can also specify which data to download by using the following flags:

* `--ap`: Download Advanced Placement data for 2015-16 only.
* `--ib`: Download International Baccalaureate data for 2015-16 only.
* `--gt`: Download Gifted and Talented data for 2015-16 only.

For example, to download only the Advanced Placement data for 2015-16, run the following command:

```bash
python3 statvar_imports/us_urban_school/ap_ib_gt_enrollment/download_2015_16.py --ap
```

## Output Files

The scripts will download the data into the following directories:

* `statvar_imports/us_urban_school/ap_ib_gt_enrollment/advanced_placements/input_files`
* `statvar_imports/us_urban_school/ap_ib_gt_enrollment/international_baccalaureate/input_files`
* `statvar_imports/us_urban_school/ap_ib_gt_enrollment/gifted_and_talented/input_files`

The output files will be in CSV or XLSX format.

## Data Source

The data is downloaded from the Civil Rights Data Collection (CRDC).

## Processing Steps

The scripts perform the following processing steps:

1. Download the data from the CRDC website.

2. Extract the relevant file from the downloaded zip file.

3. Add a 'YEAR' and 'ncesid' column to the data.

4. Save the processed data as a CSV or XLSX file.


## Processing the downloaded data

After downloading the data, you can process it by running the `run_process.sh` script in each of the data directories.
For example, to process the Advanced Placement data, run the following command:

```bash
bash statvar_imports/us_urban_school/ap_ib_gt_enrollment/advanced_placements/run_process.sh
```

Comment thread
HarishC727 marked this conversation as resolved.
You can also download and process the data in one step by using the `--download` flag:

```bash
bash statvar_imports/us_urban_school/ap_ib_gt_enrollment/advanced_placements/run_process.sh --download
Comment thread
HarishC727 marked this conversation as resolved.
Outdated
```

The processing script will:

1. Create an `output_files` directory.

2. Process the downloaded data for each year.

3. Generate statistical variables using the `stat_var_processor.py` script.
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
Node: E:common->E0
observationDate: C:common_output->observationDate
observationAbout: C:common_output->observationAbout
variableMeasured: C:common_output->variableMeasured
value: C:common_output->value
typeOf: dcs:StatVarObservation
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
{
"import_specifications": [
{
"import_name": "US_AP_ENROLLMENT",
"curator_emails": [
"support@datacommons.org"
],
"provenance_url": "https://civilrightsdata.ed.gov/data",
"provenance_description": "This dataset contains enrollment in Advanced Placement (AP) for each school by students' race and sex, Limited English Proficiency status and sex, and disability status and sex.",
"scripts": [
"run_process.sh"
],
"source_files": [
"input_files/*.xlsx",
"input_files/*.csv"
],
"import_inputs": [
{
"template_mcf": "common_output.tmcf",
"cleaned_csv": "output_files/output_*.csv"
}
],
"cron_schedule": "0 05 8,23 * *",
"resource_limits": {
"cpu": 16,
"memory": 256,
"disk": 500
}
}
]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
#!/bin/bash

# Exit immediately if a command exits with a non-zero status.
set -e

# Navigate to the script's directory to ensure relative paths work correctly.
SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )"
cd "$SCRIPT_DIR"

# Always download data
echo "--- Starting download of AP data ---"
python3 ../download_ap_ib_gt.py --ap
python3 ../download_2015_16.py --ap
echo "--- Download complete ---"

# Function to process each downloaded data file.
process_files() {
# Create the output directory if it doesn't exist.
mkdir -p output_files

# Loop through all Gifted and Talented Enrollment files in the input directory.
for input_file in input_files/*_AP_Enrollment.*; do
# Check if any file exists to avoid errors when no files are found.
[ -e "$input_file" ] || continue

echo "Processing file: $input_file"

# Extract the year from the filename (e.g., "2014" from "2014_AP_Enrollment.xlsx").
filename=$(basename "$input_file")
year=$(echo "$filename" | cut -d'_' -f1)

# Define the output path based on the year.
output_path="output_files/output_${year}_ap"

# Construct the command from the manifest.
CMD="python3 ../../../../../data/tools/statvar_importer/stat_var_processor.py"
CMD+=" --input_data=\"${input_file}\""
CMD+=" --pv_map=../config/ap_enrollment_pvmap.csv"
CMD+=" --config_file=../config/common_metadata.csv"
CMD+=" --output_path=\"${output_path}\""
CMD+=" --existing_statvar_mcf=gs://unresolved_mcf/scripts/statvar/stat_vars.mcf"

# Print and execute the command.
echo "Executing command for year ${year}:"
echo "$CMD"
eval "$CMD"
echo "--- Finished processing for year ${year} ---"
done
}

echo "--- Starting processing of files ---"
process_files
echo "--- All processing complete ---"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
YEAR,ncesid,JJ,LEA_STATE,LEA_STATE_NAME,LEAID,LEA_NAME,SCHID,SCH_NAME,COMBOKEY,SCH_APSCIENR_IND,SCH_APSCIENR_HI_M,SCH_APSCIENR_HI_F,SCH_APSCIENR_AM_M,SCH_APSCIENR_AM_F,SCH_APSCIENR_AS_M,SCH_APSCIENR_AS_F,SCH_APSCIENR_HP_M,SCH_APSCIENR_HP_F,SCH_APSCIENR_BL_M,SCH_APSCIENR_BL_F,SCH_APSCIENR_WH_M,SCH_APSCIENR_WH_F,SCH_APSCIENR_TR_M,SCH_APSCIENR_TR_F,TOT_APSCIENR_M,TOT_APSCIENR_F,SCH_APSCIENR_EL_M,SCH_APSCIENR_EL_F,SCH_APSCIENR_IDEA_M,SCH_APSCIENR_IDEA_F,SCH_APCOMPENR_IND,SCH_APCOMPENR_HI_M,SCH_APCOMPENR_HI_F,SCH_APCOMPENR_AM_M,SCH_APCOMPENR_AM_F,SCH_APCOMPENR_AS_M,SCH_APCOMPENR_AS_F,SCH_APCOMPENR_HP_M,SCH_APCOMPENR_HP_F,SCH_APCOMPENR_BL_M,SCH_APCOMPENR_BL_F,SCH_APCOMPENR_WH_M,SCH_APCOMPENR_WH_F,SCH_APCOMPENR_TR_M,SCH_APCOMPENR_TR_F,TOT_APCOMPENR_M,TOT_APCOMPENR_F,SCH_APCOMPENR_EL_M,SCH_APCOMPENR_EL_F,SCH_APCOMPENR_IDEA_M,SCH_APCOMPENR_IDEA_F,TOT_APOTHENR_M,TOT_APOTHENR_F,SCH_APENR_IND,SCH_APCOURSES,SCH_APSEL,SCH_APENR_HI_M,SCH_APENR_HI_F,SCH_APENR_AM_M,SCH_APENR_AM_F,SCH_APENR_AS_M,SCH_APENR_AS_F,SCH_APENR_HP_M,SCH_APENR_HP_F,SCH_APENR_BL_M,SCH_APENR_BL_F,SCH_APENR_WH_M,SCH_APENR_WH_F,SCH_APENR_TR_M,SCH_APENR_TR_F,TOT_APENR_M,TOT_APENR_F,SCH_APENR_EL_M,SCH_APENR_EL_F,SCH_APENR_IDEA_M,SCH_APENR_IDEA_F,SCH_APENR_504_M,SCH_APENR_504_F,SCH_APMATHENR_IND,SCH_APMATHENR_HI_M,SCH_APMATHENR_HI_F,SCH_APMATHENR_AM_M,SCH_APMATHENR_AM_F,SCH_APMATHENR_AS_M,SCH_APMATHENR_AS_F,SCH_APMATHENR_HP_M,SCH_APMATHENR_HP_F,SCH_APMATHENR_BL_M,SCH_APMATHENR_BL_F,SCH_APMATHENR_WH_M,SCH_APMATHENR_WH_F,SCH_APMATHENR_TR_M,SCH_APMATHENR_TR_F,TOT_APMATHENR_M,TOT_APMATHENR_F,SCH_APMATHENR_EL_M,SCH_APMATHENR_EL_F,SCH_APMATHENR_IDEA_M,SCH_APMATHENR_IDEA_F
2022,010000299995,Yes,AL,ALABAMA,0100002,Alabama Youth Services,99995,AUTAUGA CAMPUS,010000299995,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9
2022,010000500870,No,AL,ALABAMA,0100005,Albertville City,00870,Albertville Middle School,010000500870,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9
2022,010000500871,No,AL,ALABAMA,0100005,Albertville City,00871,Albertville High School,010000500871,Yes,12,13,0,0,2,0,0,0,1,3,14,13,2,1,31,30,0,0,0,0,Yes,2,4,0,0,0,0,0,0,1,0,11,5,0,0,14,9,0,0,0,0,66,86,Yes,10,Yes,27,45,0,0,2,0,0,0,3,7,48,57,2,3,82,112,0,1,0,0,0,0,Yes,9,16,0,0,2,0,0,0,1,2,12,19,1,0,25,37,0,0,0,0
2022,010000500879,No,AL,ALABAMA,0100005,Albertville City,00879,Albertville Intermediate School,010000500879,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9
2022,010000500889,No,AL,ALABAMA,0100005,Albertville City,00889,Albertville Elementary School,010000500889,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9
2022,010000501616,No,AL,ALABAMA,0100005,Albertville City,01616,Albertville Kindergarten and PreK,010000501616,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9
2022,010000502150,No,AL,ALABAMA,0100005,Albertville City,02150,Albertville Primary School,010000502150,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9
2022,010000600193,No,AL,ALABAMA,0100006,Marshall County,00193,Kate Duncan Smith DAR Middle,010000600193,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9
2022,010000600872,No,AL,ALABAMA,0100006,Marshall County,00872,Asbury High School,010000600872,No,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,Yes,0,0,0,0,1,0,0,0,0,1,3,4,0,0,4,5,1,0,0,0,0,0,Yes,3,Yes,0,0,0,0,1,0,0,1,0,1,23,12,0,0,24,14,1,0,0,0,0,0,No,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9
Loading
Loading