Skip to content
124 changes: 124 additions & 0 deletions statvar_imports/us_urban_school/ap_ib_gt_enrollment/README.md
Comment thread
HarishC727 marked this conversation as resolved.
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
# AP, IB, and GT Enrollment Data Downloader

This script downloads and processes Advanced Placement (AP), International Baccalaureate (IB), and Gifted and Talented (GT) enrollment data from the Civil Rights Data Collection (CRDC).

## Scripts

There are two python scripts available:

* `download_ap_ib_gt.py`: This is the main script that downloads and processes AP, IB, and GT enrollment data for multiple years.
* `download_2015_16.py`: This script is specifically for downloading and processing the 2015-16 data, which has a different structure.

## Usage

To run the scripts, execute the following commands from the `data` directory:

### Main Script

```bash
python3 statvar_imports/us_urban_school/ap_ib_gt_enrollment/download_ap_ib_gt.py
```

You can also specify which data to download by using the following flags:

* `--ap`: Download Advanced Placement data only.
* `--ib`: Download International Baccalaureate data only.
* `--gt`: Download Gifted and Talented data only.

For example, to download only the Advanced Placement data, run the following command:

```bash
python3 statvar_imports/us_urban_school/ap_ib_gt_enrollment/download_ap_ib_gt.py --ap
```

### 2015-16 Script

```bash
python3 statvar_imports/us_urban_school/ap_ib_gt_enrollment/download_2015_16.py
```

You can also specify which data to download by using the following flags:

* `--ap`: Download Advanced Placement data for 2015-16 only.
* `--ib`: Download International Baccalaureate data for 2015-16 only.
* `--gt`: Download Gifted and Talented data for 2015-16 only.

For example, to download only the Advanced Placement data for 2015-16, run the following command:

```bash
python3 statvar_imports/us_urban_school/ap_ib_gt_enrollment/download_2015_16.py --ap
```

## Output Files

The scripts will download the data into the following directories:

* `statvar_imports/us_urban_school/ap_ib_gt_enrollment/advanced_placements/input_files`
* `statvar_imports/us_urban_school/ap_ib_gt_enrollment/international_baccalaureate/input_files`
* `statvar_imports/us_urban_school/ap_ib_gt_enrollment/gifted_and_talented/input_files`

The output files will be in CSV or XLSX format.

## Data Source

The data is downloaded from the Civil Rights Data Collection (CRDC).

## Processing Steps



The scripts perform the following processing steps:



1. Download the data from the CRDC website.

2. Extract the relevant file from the downloaded zip file.

3. Add a 'YEAR' and 'ncesid' column to the data.

4. Save the processed data as a CSV or XLSX file.



Comment thread
HarishC727 marked this conversation as resolved.
Outdated
## Processing the downloaded data



After downloading the data, you can process it by running the `run_process.sh` script in each of the data directories.



For example, to process the Advanced Placement data, run the following command:



```bash

bash statvar_imports/us_urban_school/ap_ib_gt_enrollment/advanced_placements/run_process.sh

```

Comment thread
HarishC727 marked this conversation as resolved.


You can also download and process the data in one step by using the `--download` flag:



```bash

bash statvar_imports/us_urban_school/ap_ib_gt_enrollment/advanced_placements/run_process.sh --download
Comment thread
HarishC727 marked this conversation as resolved.
Outdated

```



The processing script will:



1. Create an `output_files` directory.

2. Process the downloaded data for each year.

3. Generate statistical variables using the `stat_var_processor.py` script.
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
Node: E:common->E0
observationDate: C:common_output->observationDate
observationAbout: C:common_output->observationAbout
variableMeasured: C:common_output->variableMeasured
value: C:common_output->value
typeOf: dcs:StatVarObservation
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
{
"import_specifications": [
{
"import_name": "US_AP_ENROLLMENT",
"curator_emails": [
"support@datacommons.org"
],
"provenance_url": "https://civilrightsdata.ed.gov/data",
"provenance_description": "This dataset contains enrollment in Advanced Placement (AP) for each school by students' race and sex, Limited English Proficiency status and sex, and disability status and sex.",
"scripts": [
"run_process.sh"
],
"source_files": [
"input_files/*.xlsx",
"input_files/*.csv"
],
"import_inputs": [
{
"template_mcf": "common_output.tmcf",
"cleaned_csv": "output_files/output_*.csv"
}
],
"cron_schedule": "0 05 8,23 * *",
"resource_limits": {
"cpu": 16,
"memory": 256,
"disk": 500
}
}
]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
#!/bin/bash

# Exit immediately if a command exits with a non-zero status.
set -e

# Navigate to the script's directory to ensure relative paths work correctly.
SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )"
cd "$SCRIPT_DIR"

# Default to not downloading
DOWNLOAD=false

# Parse command line arguments
while [[ "$#" -gt 0 ]]; do
case $1 in
--download) DOWNLOAD=true ;;
*) echo "Unknown parameter passed: $1"; exit 1 ;;
esac
shift
done

# Function to process each downloaded data file.
process_files() {
# Create the output directory if it doesn't exist.
mkdir -p output_files

declare -A processed_years

# Loop through all AP Enrollment files in the input directory to identify unique years
for input_file in input_files/*_AP_Enrollment.*; do
# Check if any file exists to avoid errors when no files are found.
[ -e "$input_file" ] || continue

filename=$(basename "$input_file")
year=$(echo "$filename" | cut -d'_' -f1)
extension="${filename##*.}"

# Determine expected extension based on year
expected_extension=""
if [[ "$year" == "2010" || "$year" == "2012" || "$year" == "2014" ]]; then
expected_extension="xlsx"
else
expected_extension="csv"
fi

# Skip if the extension does not match the expected one
if [[ "$extension" != "$expected_extension" ]]; then
echo "Skipping $input_file: Expected .$expected_extension, but found .$extension."
continue
fi

# If this year hasn't been processed yet, process it
if [[ -z ${processed_years[$year]} ]]; then
processed_years[$year]=true
echo "Processing year: $year"

# Define the glob pattern for input files for this year
input_data_glob="input_files/${year}_AP_Enrollment_*.${expected_extension}"

# Define the output path based on the year.
output_path="output_files/output_${year}_ap"

# Construct the command for the current year
CMD="python3 ../../../tools/statvar_importer/stat_var_processor.py \
--input_data=${input_data_glob} \
--pv_map=../config/ap_enrollment_pvmap.csv \
--config_file=../config/common_metadata.csv \
--output_path=${output_path} \
--existing_statvar_mcf=gs://unresolved_mcf/scripts/statvar/stat_vars.mcf"

# Print and execute the command.
echo "Executing command for year ${year}:"
echo "$CMD"
eval "$CMD"
Comment thread
HarishC727 marked this conversation as resolved.
Outdated
echo "--- Finished processing for year ${year} ---"
fi
done
}

if [ "$DOWNLOAD" = true ]; then
echo "--- Starting download of AP data ---"
python3 ../download_ap_ib_gt.py --ap
python3 ../download_2015_16.py --ap
echo "--- Download complete ---"
fi

echo "--- Starting processing of files ---"
process_files
echo "--- All processing complete ---"

Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
YEAR,ncesid,JJ,LEA_STATE,LEA_STATE_NAME,LEAID,LEA_NAME,SCHID,SCH_NAME,COMBOKEY,SCH_APSCIENR_IND,SCH_APSCIENR_HI_M,SCH_APSCIENR_HI_F,SCH_APSCIENR_AM_M,SCH_APSCIENR_AM_F,SCH_APSCIENR_AS_M,SCH_APSCIENR_AS_F,SCH_APSCIENR_HP_M,SCH_APSCIENR_HP_F,SCH_APSCIENR_BL_M,SCH_APSCIENR_BL_F,SCH_APSCIENR_WH_M,SCH_APSCIENR_WH_F,SCH_APSCIENR_TR_M,SCH_APSCIENR_TR_F,TOT_APSCIENR_M,TOT_APSCIENR_F,SCH_APSCIENR_EL_M,SCH_APSCIENR_EL_F,SCH_APSCIENR_IDEA_M,SCH_APSCIENR_IDEA_F,SCH_APCOMPENR_IND,SCH_APCOMPENR_HI_M,SCH_APCOMPENR_HI_F,SCH_APCOMPENR_AM_M,SCH_APCOMPENR_AM_F,SCH_APCOMPENR_AS_M,SCH_APCOMPENR_AS_F,SCH_APCOMPENR_HP_M,SCH_APCOMPENR_HP_F,SCH_APCOMPENR_BL_M,SCH_APCOMPENR_BL_F,SCH_APCOMPENR_WH_M,SCH_APCOMPENR_WH_F,SCH_APCOMPENR_TR_M,SCH_APCOMPENR_TR_F,TOT_APCOMPENR_M,TOT_APCOMPENR_F,SCH_APCOMPENR_EL_M,SCH_APCOMPENR_EL_F,SCH_APCOMPENR_IDEA_M,SCH_APCOMPENR_IDEA_F,TOT_APOTHENR_M,TOT_APOTHENR_F,SCH_APENR_IND,SCH_APCOURSES,SCH_APSEL,SCH_APENR_HI_M,SCH_APENR_HI_F,SCH_APENR_AM_M,SCH_APENR_AM_F,SCH_APENR_AS_M,SCH_APENR_AS_F,SCH_APENR_HP_M,SCH_APENR_HP_F,SCH_APENR_BL_M,SCH_APENR_BL_F,SCH_APENR_WH_M,SCH_APENR_WH_F,SCH_APENR_TR_M,SCH_APENR_TR_F,TOT_APENR_M,TOT_APENR_F,SCH_APENR_EL_M,SCH_APENR_EL_F,SCH_APENR_IDEA_M,SCH_APENR_IDEA_F,SCH_APENR_504_M,SCH_APENR_504_F,SCH_APMATHENR_IND,SCH_APMATHENR_HI_M,SCH_APMATHENR_HI_F,SCH_APMATHENR_AM_M,SCH_APMATHENR_AM_F,SCH_APMATHENR_AS_M,SCH_APMATHENR_AS_F,SCH_APMATHENR_HP_M,SCH_APMATHENR_HP_F,SCH_APMATHENR_BL_M,SCH_APMATHENR_BL_F,SCH_APMATHENR_WH_M,SCH_APMATHENR_WH_F,SCH_APMATHENR_TR_M,SCH_APMATHENR_TR_F,TOT_APMATHENR_M,TOT_APMATHENR_F,SCH_APMATHENR_EL_M,SCH_APMATHENR_EL_F,SCH_APMATHENR_IDEA_M,SCH_APMATHENR_IDEA_F
2022,010000299995,Yes,AL,ALABAMA,0100002,Alabama Youth Services,99995,AUTAUGA CAMPUS,010000299995,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9
2022,010000500870,No,AL,ALABAMA,0100005,Albertville City,00870,Albertville Middle School,010000500870,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9
2022,010000500871,No,AL,ALABAMA,0100005,Albertville City,00871,Albertville High School,010000500871,Yes,12,13,0,0,2,0,0,0,1,3,14,13,2,1,31,30,0,0,0,0,Yes,2,4,0,0,0,0,0,0,1,0,11,5,0,0,14,9,0,0,0,0,66,86,Yes,10,Yes,27,45,0,0,2,0,0,0,3,7,48,57,2,3,82,112,0,1,0,0,0,0,Yes,9,16,0,0,2,0,0,0,1,2,12,19,1,0,25,37,0,0,0,0
2022,010000500879,No,AL,ALABAMA,0100005,Albertville City,00879,Albertville Intermediate School,010000500879,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9
2022,010000500889,No,AL,ALABAMA,0100005,Albertville City,00889,Albertville Elementary School,010000500889,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9
2022,010000501616,No,AL,ALABAMA,0100005,Albertville City,01616,Albertville Kindergarten and PreK,010000501616,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9
2022,010000502150,No,AL,ALABAMA,0100005,Albertville City,02150,Albertville Primary School,010000502150,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9
2022,010000600193,No,AL,ALABAMA,0100006,Marshall County,00193,Kate Duncan Smith DAR Middle,010000600193,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9
2022,010000600872,No,AL,ALABAMA,0100006,Marshall County,00872,Asbury High School,010000600872,No,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,Yes,0,0,0,0,1,0,0,0,0,1,3,4,0,0,4,5,1,0,0,0,0,0,Yes,3,Yes,0,0,0,0,1,0,0,1,0,1,23,12,0,0,24,14,1,0,0,0,0,0,No,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9
Loading
Loading