Add box in FST-Text #87

phmz · 2017-06-02T09:08:54Z

Adding new boxes to the FST-Text would allow the user to split a token. For example, it is especially useful when analyzing social media such as tweets where people will remove the space character between words.

The new box must respect the bounds described in the manual (See 14.5 Text Automaton p. 315). Also if the text automaton is valid, the other boxes' bounds should be updated if needed.

ghost · 2019-04-12T06:25:30Z

May be the following observation is useful for this issue.
A token can be composed of many boxes representing agglutinated segments (morphemes) in the same token, in this case the path between segments are dashed line.

varowaAti_haA as an example in Arabic :
"fiy varowaAti_haA"
{fiy,.PREP} _____________ {varowaAti,varowap.N:fpaG} -----------{haA,hu.PRO+Ppers:3fs}
fiy ___________________ varowaAti ----------------------------hA

martinec assigned martinec, aleksandrachasch, ghost and phmz and unassigned martinec, aleksandrachasch and ghost Jun 3, 2017

martinec added the type:feature-request label Jun 3, 2017

martinec added this to the v3.2-beta milestone Jun 3, 2017

martinec added the status:in-progress label Aug 2, 2017

martinec removed the status:in-progress label Mar 15, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add box in FST-Text #87

Add box in FST-Text #87

phmz commented Jun 2, 2017

ghost commented Apr 12, 2019

Add box in FST-Text #87

Add box in FST-Text #87

Comments

phmz commented Jun 2, 2017

ghost commented Apr 12, 2019