A folder contains a list of legal articles
A json file contains all documents
- Build TF-IDF vectors from corpus
echo "=========================================="
echo "Indexing TFIDF ..."
INPUT=data/all_articles.json
OUTPUT=output/model_TFIDF.pkl
python document-indexing.py --index_type tfidf --file_type json --input $INPUT --output $OUTPUT
- Build TF-IDF vectors from corpus then using MDS to reduce the space
echo "=========================================="
echo "Indexing MDS ..."
INPUT=data/all_articles.json
OUTPUT=output/model_TFIDF_MDS.pkl
python document-indexing.py --index_type mds --file_type json --input $INPUT --output $OUTPUT
- Build topic model file from corpus
echo "=========================================="
echo "Creating topic vectors ..."
INPUT=data/all_articles.json
OUTPUT=output/topic.pickle
python create-topic-model.py --file_type json --input $INPUT --output $OUTPUT
Support 3 types of query:
1) query on TF-IDF space
2) query on MDS space (using MDS to reduce dimension)
3) query on TF-IDF space with injection of topic vectors
- start search api (default port = 8081)
$ python search-api.py --port 8081
Notes: using "_" instead of spaces Query: A demand for payment shall not have the effect URL:
http://0.0.0.0:8081/api/search/A_demand_for_payment_shall_not_have_the_effect
- Upload to server
git checkout master
git add -A . && git commit -m "Upload"
git push origin master
- Download from server
git checkout master
git pull origin master