-
Notifications
You must be signed in to change notification settings - Fork 4
API reference: Models V0.5a
The Aligner contains several available models that one could use instantly, but you can also write your own model to run with the Aligner.
Here are the descriptions of the models
API, if you write your own model and wishes to use it with the Aligner, it will need to fit the API requirements described below.
The description here is of module version 0.5a.
- Models of older versions might not work correctly.
-
A lot of changes with internal APIs, making them easier for people to use.
-
Minor optimisations.
-
Added some extra features like displaying figures.
In src/models/
.
In your MyModel.py
, the only thing required is a class called AlignmentModel
, which is a subclass of models.modelBase.AlignmentModelBase
class containing the following methods. In the Aligner, before loading the model, Aligner will call the checkAlignmentModel
function in src/models/modelChecker.py
to check if the model meets the requirement. If you wish your model's API to look a bit different(for example more parameters) you'll need to modify checkAlignmentModel
as well.
Parameters:
-
dataset
:Dataset
, detail of this format. -
iterations
:int
, number of iterations to run.
This is the method that will be called to do the training. Currently the parameters has to be these two and these two only. If you wish otherwise you'll need to modify checkAlignmentModel
in src/models/modelChecker.py
.
Parameters:
-
sentence
:Sentence
, detail of this format.
Return:
-
Alignment
,SentenceAlignment
, the alignment of theSentence
generated by current model. detail of this format.
This is the method that will be called to do the training. Currently the parameters has to be these two and these two only. If you wish otherwise you'll need to modify checkAlignmentModel
in src/models/modelChecker.py
.
This method is inherited from models.modelBase.AlignmentModelBase
.
Parameters:
-
dataset
:Dataset
, detail of this format.
Return:
-
Alignment
, the alignment of thedataset
generated by current model. detail of this format.
This is the method that will be called to do the training. Currently the parameters has to be these two and these two only. If you wish otherwise you'll need to modify checkAlignmentModel
in src/models/modelChecker.py
.
This method is inherited from models.modelBase.AlignmentModelBase
. It saves the model to specified fileName
.
There are two formats when it comes to saved files: .pkl
and .pklz
, the latter being the compressed version of the former which is smaller but slower to save and load. If the fileName
doesn't contain the suffixes above, the default suffix is .pkl
This method is inherited from models.modelBase.AlignmentModelBase
. It loads the model from fileName
.
There are two formats when it comes to saved files: .pkl
and .pklz
, the latter being the compressed version of the former which is smaller but slower to save and load. If the fileName
doesn't contain the suffixes above, the method here will assume that it is of .pkl
format.
There are a also few variables that should be included if one wishes to write their own model classes.
One should come up with a unique name for ones own model. When loading and saving trained models, this value will be checked to prevent accidentally loading the wrong file.
This is a list of the names of all the variables of the class instances that needs to be saved in a trained model file. For example in IBM1 model, the only one needs saving is self.t
, so in any IBM1 AlignmentModel
instances,
self.modelComponents = ["t"]
Also, if the type of the variable is defaultdict
or dict
, the self.saveModel
method will remove entries with 0
values to minimise output size.
This value will be saved in saved trained models. It marks the version of the trained model.
This is the list of all the versions of saved models that the current class supports. When loading, if such a list exists, the self.loadModel
method would check if the version of the model file is among the supported versions stated in this list.
The checkAlignmentModel
exists to make sure the aligner can at least call the model to run training and decoding without modification, it also gives hints on what doesn't fit in a model. It will if successful return the type of the model, otherwise return -1.
If you have multiple models to add you can add your their names to the supportedModels
list in modelChecker.py
, and run
python modelChecker.py
to check the APIs of all of the models at once.
In addition, you can also add an evaluator of your choosing should you wish to evaluate your model.
The provided evaluators are under src/evaluators/
. (You can also run them directly, see all options by python EVALUATOR.py -h
)
To do so, simply add the following line to your class:
class AlignmentModel():
def __init__(self):
... ...
self.evaluate = myEvaluationFunction
return
The requirement of myEvaluationFunction
:
Parameter:
- result:
Alignment
, the alignment of thebitext
generated by current model. detail of this format. - reference:
GoldAlignment
, the gold alignment used for reference. detail of this format.
Return:
-
dict
: containing the results(scores etc.).
For example:
return {
"Precision": precision,
"Recall": recall,
"AER": aer,
"F-score": fScore
}