Skip to content

Data collection and enhancement efforts for the USDOT Complete Streets Artificial Intelligence (CSAI) Initiative — Phase I. This project focuses on generating geospatial datasets across infrastructure, traveler behavior/safety, and contextual domains to support decision-making tools for multimodal transportation planning.

License

Notifications You must be signed in to change notification settings

VIDA-NYU/OSCUR-data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

80 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OSCUR data

HuggingFace oscur Python 3.8+


Overview

This repository presents a comprehensive framework for collecting, documenting, and analyzing urban transportation datasets, with a focus on New York City data sources. It integrates spatial data science theory with practical implementation aligned with U.S. Department of Transportation (DOT) data standards.

This repository tracks the progress of data generation and enhancement. The goal is to generate geospatial datasets across three key categories: Infrastructure, Traveler Behavior/Safety, and Context, to support decision-making tools for multimodal transportation planning. For detailed progress tracking, refer to the GitHub issue.

🤗 All datasets uploaded are available on our HuggingFace Hub.

Repository Structure

OSCUR-data/
├── metadata/                 # YAML specifications describing each data source
├── code/                     # Scripts to download, process, and upload data
│   ├── metadata_generators/     # Generate standardized metadata YAML files
│   ├── downloaders/             # Raw data acquisition from various APIs
│   ├── processors/              # Data cleaning, transformation, and validation
│   └── upload_to_hugging_face/  # Utilities for uploading datasets to Hugging Face
├── data_profiles/            # JSON summaries/statistics of datasets
└── examples/                 # Jupyter notebooks demonstrating dataset usage

Add a New Dataset

To contribute a new dataset to this repository, follow these steps:

1. Metadata

  • Store dataset metadata as individual YAML files in the metadata/ directory.
  • Refer to the guide in code/metadata_generators for how to create or generate a metadata file.
  • Ensure all required metadata fields (e.g., title, description, source, license) are completed.

2. Code

  • Metadata Generators:

    • Add or modify scripts in code/metadata_generators to generate standardized YAML metadata files.
    • These scripts can use NYC Open Data APIs or other APIs to extract metadata and save it in the metadata/ directory.
  • Downloaders:

    • Add a Python script that collects raw data from the source to code/downloaders.
    • If multiple scripts are needed, create a subdirectory named after the dataset ID (e.g., code/downloaders/your_dataset_id/).
  • Processors:

    • Add a Python script for cleaning, transforming, and validating the data to code/processors.
    • If necessary, group related scripts under a folder named after the dataset ID.
  • Uploader:

3. Data Profile

Generate a profile summary of the dataset (recommended: use datamart-profiler) and save it as a .json file in data_profiles/.

4. Usage Example

  • Provide a Jupyter notebook demonstrating how to use or visualize the dataset.
  • Save it to examples/.

Contributing

We welcome contributions to enhance the dataset collection and improve the tools! Please:

  • Check the GitHub issue for current progress and to avoid duplicating efforts.
  • Submit pull requests with new datasets, scripts, or documentation updates.

License

This project is licensed under the Apache License, Version 2.0. See the LICENSE file for details.

About

Data collection and enhancement efforts for the USDOT Complete Streets Artificial Intelligence (CSAI) Initiative — Phase I. This project focuses on generating geospatial datasets across infrastructure, traveler behavior/safety, and contextual domains to support decision-making tools for multimodal transportation planning.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •