Dallas Deed OCR

Overview

This project processes public records from Dallas to extract mortgage principal amounts using OCR (Optical Character Recognition). It includes scripts for data scraping, image processing, and visualization.

Directory Structure

data/: Contains parcel shapefiles and the main database.
images/: Downloaded images used for OCR.
scripts/: Python scripts for data processing.
results/: Output files including CSVs and videos.
tests/: Test scripts to verify functionality.
audio/: Audio files included in the repository.

Setup

To set up the environment, follow these steps:

Create the conda environment:
```
conda env create -f environment.yml
```
Activate the environment:
```
conda activate deed_ocr_dallas
```

Usage

Initialize the database:

python scripts/check_db.py

Run the main script:

python scripts/find_principal_dallas.py

Visualize the results:

python scripts/visualize_principal.py

Contributing

Contributions are welcome. Please fork the repository and submit a pull request.

##About This script collects Dallas' parcel-level mortgage data in reverse chronological order. To collect all data since 2020 would require ~$3000 in cloud compute costs or 1 month of compute time on a single computer of the specs below.

License

This project is licensed under the GPLv3+ License - see the LICENSE file for details.

System specs

CPU: x86_64
RAM: 15.46 GB
Storage: 1006.85 GB
OS: Linux 5.15.146.1-microsoft-standard-WSL2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Dallas Deed OCR

Overview

Directory Structure

Setup

Usage

Contributing

License

System specs

Files

README.md

Latest commit

History

README.md

File metadata and controls

Dallas Deed OCR

Overview

Directory Structure

Setup

Usage

Contributing

License

System specs