Skip to content

Commit

Permalink
build: Bump Python version in project setup
Browse files Browse the repository at this point in the history
Follow-on from #135.

I've also tweaked the README content and formatting.
  • Loading branch information
jesse-c committed Jul 17, 2024
1 parent 74ac9d9 commit d7b8022
Show file tree
Hide file tree
Showing 3 changed files with 66 additions and 103 deletions.
12 changes: 6 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,10 @@

## Overview

This repo contains a CLI application used for extracting text from pdf and html documents before translating these to a target language (default is English) if they are in a different language to the target language.
This repo contains a CLI application used for extracting text from pdf and html documents before translating these to a target language (default is English) if they are in a different language to the target language.

**HTML Text Extraction:**
- HTML webpages are processed by making a request to the webpage and extracting text from the html content using a combination of the `news-please` and `readability` python packages.
**HTML Text Extraction:**
- HTML webpages are processed by making a request to the webpage and extracting text from the html content using a combination of the `news-please` and `readability` python packages.

**PDF Text Extraction:**
- PDF documents are processed by downloading the pdf from the cdn (Content Delivery Network accessible via an endpoint) and using the `Azure` form recognizer API to extract text from the pdf.
Expand All @@ -18,14 +18,14 @@ This repo contains a CLI application used for extracting text from pdf and html
To operate and run the CLI the repo provides useful commands in the `Makefile`. This reads environment variables from a `.env` file. Create this locally by running the following command and then enter the relevant values.

``` bash
cp .env.example .env
make setup
```

Once this is done we can then run the commands in the `Makefile`. These split into two main groups, running `locally` or in a `docker container`.
Once this is done we can then run the commands in the "Makefile". These split into two main groups, running directly on your machine, or in a Docker container.

To run locally run the following commands to install dependencies using Poetry and set up playwright and pre-commit. Then run the CLI locally.

Note that you will need a python version in your virtual environment of `3.9` or greater. It is also recommended to run within a virtual environment.
Note that you will need a Python version in your virtual environment of that matches the project version. It is also recommended to run within a virtual environment.

``` bash
make install
Expand Down
Loading

0 comments on commit d7b8022

Please sign in to comment.