Code, models, and data for Compositionality of Complex Graphemes in the Undeciphered Proto-Elamite Script using Image and Text Embedding Models, published in Findings of ACL 2021.
To build all models from scratch and generate the results from the paper, run
make all
from the root directory.
Alternatively, each model has its own directory with a run.sh
file which will train all versions of that model used in the paper. You must generate the input files with make .data
before training any models.
Pretrained models are included in pretrained/models
.
Embeddings from all models used in the paper are included in pretrained/embeddings
.
All statistics and analysis scripts are located in ocs\_pcs
.
python metrics.py && python stats.py
will compute PCS for every sign in every model and summarize the resulting scores.
python analogy.py
computes the number of compositional signs and analogies in each model and outputs the results cited in the paper. For each value of k, a csv file will be saved to ocs\_pcs/csvs
listing information about which signs are compositional, to what degree they are compositional, and in which models.
These scripts use the pretrained embeddings included with the repository, so they can be run without retraining the models from scratch.