Copyright (C) 2012 Nguyen Viet Cuong, Ye Nan, Sumit Bhagwani
This is the README file for HOSemiCRF version 1.0
HOSemiCRF is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
HOSemiCRF is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with HOSemiCRF. If not, see http://www.gnu.org/licenses/.
=== WARNING ===
HOSemiCRF requires a lot of memory usage. It is best to run the program in parallel on a computing cluster with lots of memory.
=== COMPILATION STEPS ===
Requirement: Apache Ant (http://ant.apache.org/)
-
Download the HOSemiCRF repository as a zip file: HOSemiCRF-master.zip
-
Unzip the file:
unzip HOSemiCRF-master.zip
-
Compile the program:
cd HOSemiCRF-master
ant
=== RUN THE PUNCTUATION PREDICTION PROGRAM ===
cp dist/lib/HOSemiCRF.jar run/punc/
cd run/punc
java -cp "HOSemiCRF.jar" Applications.PunctuationPredictor all punc.conf
=== RUN THE REFERENCE PREDICTION PROGRAM ===
cp dist/lib/HOSemiCRF.jar run/ref/
cd run/ref
java -cp "HOSemiCRF.jar" Applications.ReferenceTagger all ref.conf
=== RUN THE OCR PROGRAM ===
Download data from http://www.seas.upenn.edu/~taskar/ocr/ to the folder run/ocr/
cp dist/lib/HOSemiCRF.jar run/ocr/
cd run/ocr
java -cp "HOSemiCRF.jar" OCR.OCR all ocr.conf 0
=== MORE INFO ===
Please visit: https://github.com/nvcuong/HOSemiCRF/wiki