Skip to content

Releases: bootphon/wordseg

wordseg-0.6.2

20 Apr 11:42
Compare
Choose a tag to compare
  • Improved documentation and algorithms description.

  • Docker image now uses python-3.6 from anaconda,

  • New tests to ensure replication of scores from CDSWordSeg <https://github.com/alecristia/CDSwordSeg>_ to wordseg for puddle,
    tp, dibs and dpseg.

  • In wordseg-ag the <grammar> and <segment-category>
    parameters are now optional. When omitted a default colloc0 grammar
    is generated from the input text.

  • In wordseg-dpseg

    • fixed forwarding of some arguments from Python to C++,
    • implementation of dpseg bugfix when single char on first line of
      a fold,
    • use the original random number generator to replicate exactly
      CDSWordSeg.
    • fixed default ngram to bigram (was already bigram but documented
      as unigram).
  • In wordseg-dibs

    • fixed bug when loading train text at syllable level (new
      --unit* option)
    • safer use of train text (ensure there are word separators in
      it, ignore empty lines).
  • In wordseg-eval

    • when called from bash, the scores are now displayed in a fixed
      order. New test to ensure bash and python calls to wordseg lead to
      identical results. See #31.
    • distinction between edge/no edge in boundary scoring. See #21.
  • In wordseg-stats the scores are now displayed in a fixed order.

  • In wordseg-syll

    • the --tolerant option allows to ignore utterances where the
      syllabification failed (the default is to exit the program on the
      first error). See #36.

wordseg-0.5

20 Oct 13:43
Compare
Choose a tag to compare
  • Implementation of Adaptor Grammar as wordseg-ag,
  • Installation now relies on cmake (was python setuptools),
  • Improvements in tests and documentation,
  • Various bugfixes.

wordseg-0.4.1

16 Oct 21:33
Compare
Choose a tag to compare

First public release, complete rewrite of the word segmentation pipeline in the CDSwordseg project.

  • Four algorithms (tp, puddle, dpseg, dibs),
  • Segmentation prepocessing and evaluation,
  • Unit tests and documentation.