GitHub - joshua-decoder/thrax: Hadoop-based tool for extraction of large scale synchronous grammars for paraphrasing and machine translation

joshua-decoder / thrax Public

forked from jweese/thrax

Notifications You must be signed in to change notification settings
Fork 6
Star 15

Hadoop-based tool for extraction of large scale synchronous grammars for paraphrasing and machine translation

joshua-decoder.org

View license

15 stars 16 forks Branches Tags Activity

Star

Notifications

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 780 Commits
example		example
lib		lib
scripts		scripts
src/edu/jhu/thrax		src/edu/jhu/thrax
test/edu/jhu/thrax		test/edu/jhu/thrax
.classpath		.classpath
.gitignore		.gitignore
.project		.project
AwsCredentials.properties		AwsCredentials.properties
LICENSE.txt		LICENSE.txt
README		README
build.xml		build.xml
testng.xml		testng.xml

Repository files navigation

Thrax uses Apache hadoop (an open-source implementation of MapReduce) to
efficiently extract a synchronous context-free grammar translation model
for use in modern machine translation systems.

Thrax currently has support for both Hiero-style grammars (with a single
non-terminal symbol) and SAMT-style grammars (where non-terminal symbols are
calculated by projecting onto the span from a target-side parse tree).

COMPILING:

First, you need to set two environment variables:
$HADOOP should point to the directory where Hadoop is installed.
$AWS_SDK should point to the directory where the Amazon Web Services SDK
is installed.

To compile, type

    ant

This will compile all classes and package them into a jar for use on a 
Hadoop cluster.

At the end of the compilation, ant should report that the build was successful.

RUNNING THRAX:
Thrax can be invoked with

    hadoop jar $THRAX/bin/thrax.jar <configuration file>

Some example configuration files have been included with this distribution:

    example/hiero.conf
    example/samt.conf

COPYRIGHT AND LICENSE:
Copyright (c) 2010-13 by the Thrax team:
    Jonny Weese <[email protected]>
    Juri Ganitkevitch <[email protected]>

See LICENSE.txt (included with this distribution) for the complete terms.