Thought this would be a simpler implementation rather than Django, so thought I'd take the opportunity to use FastAPI for a slightly larger project.
The data is stored in memory, so could be the basis of a sharded application, but may want to drop to a lower level language. Otherwise look into scaled DBs e.g. Redis/ES/Aurora
For a fully structured source package layout etc see the Django implementation.
pip install virtualenvwrapper
mkvirtualenv cpr
pip install --upgrade pip
pip install -r requirements.txt
Optional: download spacy data file
python -m spacy download en_core_web_md
Launch the server!
uvicorn main:app --reload
OR
docker build -t cpr2 . && docker run -p 8000:8000 -v %cd%:/usr/src/app cpr2
That's it!
To stop the docker container
docker ps
docker stop *dockerid*
Where the first few letters of the docker instance are enough.
Obligatory! http://127.0.0.1:8000/
http://127.0.0.1:8000/policy/10088
http://localhost:8000/policy/search/environment%20policy Will return jaccard_similarity in documents, which are ordered descending by that similarity.
If you're not on Windows! http://localhost:8000/policy/spacy_search/environment%20policy This is not preloaded at the moment but the nlp processing is cached so it will get speedier with use.
Autogenerated with FastAPI http://127.0.0.1:8000/docs http://127.0.0.1:8000/redoc
Install pytest
pip install pytest
Then run tests
pytest tests/tests.py
- Complete tests
- Put/Patch/Delete
- Pydantic/ORM etc
- tfidf maybe?
- Store vectors
- NLP models, Bert/Google Universal Sentence Encoder?
- Preload nlp for documents
- Use scalable DB e.g. ES, redis, serverless DB etc.