# Automated Testing for onefilellm.py
This README provides information about the automated testing script for the `onefilellm.py` data aggregation tool.
## Purpose
The purpose of the automated testing script is to ensure that the `onefilellm.py` tool functions as expected and produces the desired output for various input sources. The script includes test cases for the following input types:
- GitHub repository
- GitHub pull request
- GitHub issue
- arXiv PDF
- Local folder
- YouTube transcript
- Webpage crawling
- Sci-Hub paper using DOI
- Sci-Hub paper using PMID
## Test Suite
The test suite is implemented using Python's built-in `unittest` module. It consists of the following test cases:
1. `test_github_repo`: Tests the processing of a GitHub repository using the URL `https://github.com/jimmc414/onefilellm`.
2. `test_github_pull_request`: Tests the processing of a GitHub pull request using the URL `https://github.com/dear-github/dear-github/pull/102`.
3. `test_github_issue`: Tests the processing of a GitHub issue using the URL `https://github.com/isaacs/github/issues/1191`.
4. `test_arxiv_pdf`: Tests the processing of an arXiv PDF using the URL `https://arxiv.org/abs/2401.14295`.
5. `test_local_folder`: Tests the processing of a local folder using the path `C:\python\1filellm`.
6. `test_youtube_transcript`: Tests the fetching of a YouTube video transcript using the URL `https://www.youtube.com/watch?v=KZ_NlnmPQYk`.
7. `test_webpage_crawl`: Tests the crawling and text extraction from a webpage using the URL `https://llm.datasette.io/en/stable/`.
8. `test_process_doi`: Tests the processing of a Sci-Hub paper using the DOI `10.1053/j.ajkd.2017.08.002`.
9. `test_process_pmid`: Tests the processing of a Sci-Hub paper using the PMID `29203127`.
Each test case verifies that the corresponding function in `onefilellm.py` produces the expected output files and that the content of the output files is not empty.
## Running the Tests
To run the automated tests, follow these steps:
1. Make sure you have the required dependencies installed. You can install them using the following command:
pip install -U -r requirements.txt
2. Open a terminal or command prompt and navigate to the directory containing the `test_onefilellm.py` script.
3. Run the following command to execute the tests:
python test_onefilellm.py
The test suite will run, and you will see the test results in the console output.
## Test Results
The test results will indicate whether each test case passed or failed. If a test case fails, the console output will provide information about the failure, including the specific assertion that failed and the line number where the failure occurred.
A successful test run will output something similar to:
Testing GitHub repository processing... GitHub repository processing test passed.
Testing GitHub pull request processing... GitHub pull request processing test passed.
Testing GitHub issue processing... GitHub issue processing test passed.
Testing arXiv PDF processing... arXiv PDF processing test passed.
Testing local folder processing... Local folder processing test passed.
Testing YouTube transcript fetching... YouTube transcript fetching test passed.
Testing webpage crawling and text extraction... Webpage crawling and text extraction test passed.
Testing DOI processing... DOI processing test passed.
Testing PMID processing... PMID processing test passed.
Ran 9 tests in X.XXXs
OK
If any tests fail, the output will indicate which tests failed and provide details about the failures.
## Modifying the Tests
If you make changes to the `onefilellm.py` script or want to add more test cases, you can modify the `test_onefilellm.py` script accordingly. Add new test methods or update the existing ones to cover different scenarios or input sources.
When adding or modifying test cases, make sure to follow the naming convention `test_*` for the test methods so that the `unittest` module can discover and run them automatically.
If any tests fail, the output will indicate which tests failed and provide details about the failures.
## Modifying the Tests
If you make changes to the `onefilellm.py` script or want to add more test cases, you can modify the `test_onefilellm.py` script accordingly. Add new test methods or update the existing ones to cover different scenarios or input sources.
When adding or modifying test cases, make sure to follow the naming convention `test_*` for the test methods so that the `unittest` module can discover and run them automatically.
## Troubleshooting
If you encounter any issues while running the tests, consider the following:
- Make sure you have the latest version of the required dependencies installed.
- Ensure that the input sources used in the tests (e.g., GitHub repository, arXiv PDF, local folder) are accessible and contain the expected content.
- Double-check that the file paths and URLs used in the tests are correct and valid.
- Ensure you have a stable internet connection for tests that involve downloading content from external sources like Github.
- Ensure you have a valid Personal Token from Github for tests that involve downloading repos from Github (see readme.md for more detail).
- If you encounter failures related to DOI or PMID processing, it may be due to temporary inaccessibility to Sci-Hub or the specific DOI or PMID request. Try running the tests again at a later time when Sci-Hub is accessible or trying another DOI/PMID.