The implementation source code is based on the following projects
Run the following codes for biomedical dataset preprocess
python preprocess_bio.py --data_type 0
STAR: use the provided STAR model to compute query/passage embeddings and perform similarity search on the biomedical dataset.
python inference.py --data_type doc --max_doc_length 512 --mode bio-train
Tree Initialization
After embedding documents and query, we can initialize the tree using recursive k-means.
Run the following codes in JTR repo:
python construct_tree.py