Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add box in FST-Text #87

Open
phmz opened this issue Jun 2, 2017 · 1 comment
Open

Add box in FST-Text #87

phmz opened this issue Jun 2, 2017 · 1 comment
Assignees
Milestone

Comments

@phmz
Copy link
Contributor

phmz commented Jun 2, 2017

Adding new boxes to the FST-Text would allow the user to split a token. For example, it is especially useful when analyzing social media such as tweets where people will remove the space character between words.

The new box must respect the bounds described in the manual (See 14.5 Text Automaton p. 315). Also if the text automaton is valid, the other boxes' bounds should be updated if needed.

@ghost
Copy link

ghost commented Apr 12, 2019

May be the following observation is useful for this issue.
A token can be composed of many boxes representing agglutinated segments (morphemes) in the same token, in this case the path between segments are dashed line.

varowaAti_haA as an example in Arabic :
"fiy varowaAti_haA"
{fiy,.PREP} _____________ {varowaAti,varowap.N:fpaG} -----------{haA,hu.PRO+Ppers:3fs}
fiy ___________________ varowaAti ----------------------------hA

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants