-
Notifications
You must be signed in to change notification settings - Fork 309
/
Copy pathhyphenation_patterns.xml
98 lines (90 loc) · 1.25 KB
/
hyphenation_patterns.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE hyphenation-info SYSTEM "hyphenation.dtd">
<hyphenation-info>
<hyphen-char value="-"/>
<hyphen-min before="2" after="2"/>
<classes>
@!$%^&*()_-+=~`{[}]:;'|<,.>?/0123455789
aA
bB
cC
dD
eE
fF
gG
hH
iI
jJ
kK
lL
mM
nN
oO
pP
qQ
rR
sS
tT
uU
vV
wW
xX
yY
zZ
äÄ
åÅ
öÖ
üÜ
ß
</classes>
<exceptions>
</exceptions>
<!--
Numerals in the following patterns indicate word positions,
at which a split should occure.
The dot represent word start/end markers.
Note: even digit means: avoid split, uneven: consider split, value reflects priority
Because of a bug in lucene, only digits between 1-6 should be used.
(see https://issues.apache.org/jira/browse/LUCENE-8124)
Example:
For pattern 5str. the hyphenation_decompounder would generate the following tokens:
Hauptst => Haupts
Hauptstr => Hauptstr, Haupt, str
Hauptstra => Hauptstra
This filter assumes to be applied after lowercasing and assci_folding/german_normalization.
-->
<patterns>
5aue.
5allee.
5berg.
5blick.
5chaussee.
5damm.
5dorf.
5feld.
5felde.
5fleck.
5flecklein.
5gasse.
5garten.
5gebiet.
5graben.
5hain.
5heide.
5hoehe.
5hof.
5hofe.
5markt.
5park.
5platz.
5ring.
5stadt.
5str.
5strasse.
5tal.
5ufer.
5wald.
5weg.
5werk.
</patterns>
</hyphenation-info>