-
Notifications
You must be signed in to change notification settings - Fork 4
API reference: Main V0.5a
This is the API reference to the Aligner main programme (src/aligner.py
).
The description here is of main version 0.5a.
-
support for models v0.4a
-
added options to load and save trained models.
-
support for the new
Dataset
Data format, removed oldbitext
andtritext
Run
> python aligner.py -h
To see all options.
A sample config file is provided in src\sample_config_file.ini
.
The purpose of a config file is to provide information regarding specific testing and training data, instead of having to type all the options on the console.
The config file is divided into 3 sections: General, TrainData, and TestData.
[General]
DataDirectory = ~/Data/
TargetLanguageSuffix = cn
SourceLanguageSuffix = en
[TrainData]
TextFilePrefix = train
TagFilePrefix = train.tags
AlignmentFileSuffix = wa
[TestData]
TextFilePrefix = test
TagFilePrefix = test.tags
Reference = FULLPATHTOFILE.WA
The aligner will search for files that matches the prefix and suffix given above in the DataDirectory
. Please note that currently Reference
has to be the full path.
The descriptions of file formats supported by this version are here.
Saved model files are of .pkl
and .pklz
formats, with the latter being the compressed version of the former which is smaller in size but usually takes longer to save and load.
Please note that when loading saved files, the model will check the file's modelName (and version if applicable, see API reference for Alignment Models for more detail) to prevent accidentally loading a file for a different model(or unsupported version of current model).