Skip to content

Conversation

brandomr
Copy link
Collaborator

Data Gatherer Package Refactoring

This PR transforms the repository into a proper pip-installable package with a clean API, making it easier to use, distribute, and maintain.

Key Changes

  • Package Structure: Reorganized code into a proper Python package with core modules, utilities, and configuration
  • Clean API: Created a simple API that allows users to process URLs directly or from files
  • Improved Configuration: Centralized configuration with proper path resolution
  • Command-line Interface: Added CLI for easy usage from the terminal
  • Error Handling: Added better error handling and graceful fallbacks
  • Documentation: Updated README with installation and usage instructions

Usage Example

  from data_gatherer import DataGatherer

  # Initialize with default configuration
  gatherer = DataGatherer()

  # Process a list of URLs
  urls = ["https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10113009"]
  results = gatherer.process_urls(urls)

  # Process URLs from a file
  results = gatherer.process_file("input/urls.txt")

Installation

  pip install git+https://github.com/yourusername/data-gatherer.git

This refactoring maintains all existing functionality while making the package more user-friendly and easier to integrate into other projects.

For local testing/dev you should run pip install -e . to install it in editable mode.

@brandomr brandomr requested a review from pietromarini00 March 25, 2025 18:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants