API reference: Models V0.5a

Introduction

The Aligner contains several available models that one could use instantly, but you can also write your own model to run with the Aligner.

Here are the descriptions of the models API, if you write your own model and wishes to use it with the Aligner, it will need to fit the API requirements described below.

The description here is of module version 0.5a.

Dependency

Compatibility

Models of older versions might not work correctly.

Changes in v0.5a

A lot of changes with internal APIs, making them easier for people to use.
Minor optimisations.
Added some extra features like displaying figures.

Where do I place the model file (MyModel.py)?

In src/models/.

API requirements

In your MyModel.py, the only thing required is a class called AlignmentModel, which is a subclass of models.modelBase.AlignmentModelBase class containing the following methods. In the Aligner, before loading the model, Aligner will call the checkAlignmentModel function in src/models/modelChecker.py to check if the model meets the requirement. If you wish your model's API to look a bit different(for example more parameters) you'll need to modify checkAlignmentModel as well.

Model Methods

def train(self, dataset, iterations)

Parameters:

dataset: Dataset, detail of this format.
iterations: int, number of iterations to run.

This is the method that will be called to do the training. Currently the parameters has to be these two and these two only. If you wish otherwise you'll need to modify checkAlignmentModel in src/models/modelChecker.py.

def decodeSentence(self, dataset)

Parameters:

sentence: Sentence, detail of this format.

Return:

Alignment, SentenceAlignment, the alignment of the Sentence generated by current model. detail of this format.

This is the method that will be called to do the training. Currently the parameters has to be these two and these two only. If you wish otherwise you'll need to modify checkAlignmentModel in src/models/modelChecker.py.

def decode(self, dataset)

This method is inherited from models.modelBase.AlignmentModelBase.

Parameters:

dataset: Dataset, detail of this format.

Return:

Alignment, the alignment of the dataset generated by current model. detail of this format.

This is the method that will be called to do the training. Currently the parameters has to be these two and these two only. If you wish otherwise you'll need to modify checkAlignmentModel in src/models/modelChecker.py.

def loadModel(self, fileName=None)

This method is inherited from models.modelBase.AlignmentModelBase. It saves the model to specified fileName.

There are two formats when it comes to saved files: .pkl and .pklz, the latter being the compressed version of the former which is smaller but slower to save and load. If the fileName doesn't contain the suffixes above, the default suffix is .pkl

def saveModel(self, fileName=None)

This method is inherited from models.modelBase.AlignmentModelBase. It loads the model from fileName.

There are two formats when it comes to saved files: .pkl and .pklz, the latter being the compressed version of the former which is smaller but slower to save and load. If the fileName doesn't contain the suffixes above, the method here will assume that it is of .pkl format.

There are a also few variables that should be included if one wishes to write their own model classes.

self.modelName, str

One should come up with a unique name for ones own model. When loading and saving trained models, this value will be checked to prevent accidentally loading the wrong file.

self.modelComponents, list(of str)

This is a list of the names of all the variables of the class instances that needs to be saved in a trained model file. For example in IBM1 model, the only one needs saving is self.t, so in any IBM1 AlignmentModel instances,

self.modelComponents = ["t"]

Also, if the type of the variable is defaultdict or dict, the self.saveModel method will remove entries with 0 values to minimise output size.

optional: self.version, str

This value will be saved in saved trained models. It marks the version of the trained model.

optional: self.supportedVersion, list(of str)

This is the list of all the versions of saved models that the current class supports. When loading, if such a list exists, the self.loadModel method would check if the version of the model file is among the supported versions stated in this list.

Use the checkAlignmentModel

The checkAlignmentModel exists to make sure the aligner can at least call the model to run training and decoding without modification, it also gives hints on what doesn't fit in a model. It will if successful return the type of the model, otherwise return -1.

If you have multiple models to add you can add your their names to the supportedModels list in modelChecker.py, and run

python modelChecker.py

to check the APIs of all of the models at once.

Evaluator

In addition, you can also add an evaluator of your choosing should you wish to evaluate your model.

The provided evaluators are under src/evaluators/. (You can also run them directly, see all options by python EVALUATOR.py -h)

To do so, simply add the following line to your class:

class AlignmentModel():
    def __init__(self):
        ... ...
        self.evaluate = myEvaluationFunction
        return

The requirement of myEvaluationFunction:

Parameter:

result: Alignment, the alignment of the bitext generated by current model. detail of this format.
reference: GoldAlignment, the gold alignment used for reference. detail of this format.

Return:

dict: containing the results(scores etc.).

For example:

return {
    "Precision": precision,
    "Recall": recall,
    "AER": aer,
    "F-score": fScore
}

Home
API reference
- V0.6a (Current)
- Older versions
  - V0.5a
  - V0.4a
  - V0.3a
  - V0.2a
  - V0.1a
Experiments on Chinese-English
Experiments on German-English
Experiments on French-English

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API reference: Models V0.5a

Introduction

Dependency

Compatibility

Changes in v0.5a

Where do I place the model file (MyModel.py)?

API requirements

Model Methods

def train(self, dataset, iterations)

def decodeSentence(self, dataset)

def decode(self, dataset)

def loadModel(self, fileName=None)

def saveModel(self, fileName=None)

There are a also few variables that should be included if one wishes to write their own model classes.

self.modelName, str

self.modelComponents, list(of str)

optional: self.version, str

optional: self.supportedVersion, list(of str)

Use the checkAlignmentModel

Evaluator

Clone this wiki locally