-
Notifications
You must be signed in to change notification settings - Fork 0
/
00_readme.txt
76 lines (62 loc) · 3.29 KB
/
00_readme.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
Issues for the smanob version 2011:
1. checking the entries:
1.1 well-formed entries
1.2 no xxx tags
1.3 no double entries, even not als orthographic variants which I generate using the pipeline
(for instance, musihke and musïhke ... or so in the latest entries from the income-dir)
1.4 no double entries with both dashed end non-dashed compounds
(no optional dashed word as lemma: for instance, øvtiedimmiesoptsestalleme _ utviklingssamtale vs. øvtiedimmie-soptsestalleme _ utviklingssamtale)
At the moment, there are
Todo_jan2011>grep '-' 20110113_001-200_noun_not_in_dict.csv | grep -v xxx | wc -l
45
dashed entries in the last income file (nouns).
2. coverage: please check the entries that did not get a paradim during generation.
According to Trond, there are sub-variants, but then one has to decide what to do with them
(leave them out, take them marked accordingly, etc.)
3. decide upon the format of dictionary and its content depending on the format
(For instance, no proper nouns in the mobile version)
====================
what to do with that? (no "dict" tag)
- kommentar fra Lene: nå har den dict-tag
<e>
<lg>
<l pos="a">båetije</l>
</lg>
<mg>
<tg>
<t decl="4" pos="a">kommende</t>
</tg>
</mg>
</e>
====================
general check of bracketed stuff, is it perhaps some re-element?
- Kommentar fra Lene: Disse er fjerna
pr_smanob.xml: <t dict="yes" oa="yes" pos="pr" tcomm="no">til (nær)</t>
pr_smanob.xml: <t dict="yes" oa="yes" pos="pr" tcomm="no">uten (flere)</t>
pr_smanob.xml: <t dict="yes" oa="yes" pos="pr" tcomm="no">hitenfor (flere)</t>
pr_smanob.xml: <t dict="yes" oa="yes" pos="pr" tcomm="no">vestafor (flere)</t>
pr_smanob.xml: <xt>Gå vestafor de snaufjellene (med krattskog omkring)!</xt>
pr_smanob.xml: <t dict="yes" oa="yes" pos="pr" tcomm="no">østafor (flere steder)</t>
pr_smanob.xml: <t dict="yes" oa="yes" pos="pr" tcomm="no">innenfor (flere)</t>
pronIndef_smanob.xml: <xt>mor og barn (fast uttrykk)</xt>
====================
I thought that the han/hun-problem has been solved, Lene?
Kommentar fra Lene: Dette eksisterer ikke lenger - det var lagt til da vi prøvde ut pronomener i Leksa.
pronPers_smanob.xml: <t dict="yes" oa="yes" pos="pron" stat="pref" tcomm="no">dere to (du og han/hun
)</t>
pronPers_smanob.xml: <t dict="yes" oa="yes" pos="pron" stat="pref" tcomm="no">dere (du og han/hun)</
t>
pronPers_smanob.xml: <t dict="yes" oa="yes" pos="pron" stat="pref" tcomm="no">dere to (du og han/hun
)</t>
pronPers_smanob.xml: <t dict="yes" oa="yes" pos="pron" stat="pref" tcomm="no">dere to (deg og ham/he
nne)</t>
====================
Cleanup before new compilation autumn 2011:
1. Do we need the file names.xml for the dictionary? I doubt it. If not then delete it.
2. I will ignore the translationcomment attribute in the usikre file.
< <t pos="adv">stille</t>
< <t pos="adv">tyst</t>
---
> <t dict="yes" oa="yes" pos="adv" tcomm="no" translationcomment="no">stille</t>
> <t dict="yes" oa="yes" pos="adv" tcomm="no" translationcomment="no">tyst</t>
As far as I remember it is not needed for the dictionary, only for Oahpa.