-
Notifications
You must be signed in to change notification settings - Fork 59
/
Copy pathwarandpeace-stats.txt
246 lines (240 loc) · 7.38 KB
/
warandpeace-stats.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
=== RESULTS ===
File = warandpeace.txt
Longest word = hofs-kriegs-wurst-schnapps-rath (31)
Shortest word = i (1)
Mean word length /chars = 4.52492884926
Total words parsed = 565349
Total chars parsed = 2558164
=== Commonest words ===
1 = the (34247 = 6.057%)
2 = and (21793 = 3.854%)
3 = to (16602 = 2.936%)
4 = of (14948 = 2.644%)
5 = a (10414 = 1.842%)
6 = he (9604 = 1.698%)
7 = in (8863 = 1.567%)
8 = his (7951 = 1.406%)
9 = that (7658 = 1.354%)
10 = was (7315 = 1.293%)
11 = with (5671 = 1.003%)
12 = had (5349 = 0.946%)
13 = it (4901 = 0.866%)
14 = her (4655 = 0.823%)
15 = not (4606 = 0.814%)
16 = at (4507 = 0.797%)
17 = him (4495 = 0.795%)
18 = on (3951 = 0.698%)
19 = as (3915 = 0.692%)
20 = but (3664 = 0.648%)
21 = for (3473 = 0.614%)
22 = she (3329 = 0.588%)
23 = i (3274 = 0.579%)
24 = is (3226 = 0.570%)
25 = you (3168 = 0.560%)
26 = said (2829 = 0.500%)
27 = all (2678 = 0.473%)
28 = from (2675 = 0.473%)
29 = be (2426 = 0.429%)
30 = were (2404 = 0.425%)
31 = by (2387 = 0.422%)
32 = they (2083 = 0.368%)
33 = who (2042 = 0.361%)
34 = this (2013 = 0.356%)
35 = one (1990 = 0.351%)
36 = what (1985 = 0.351%)
37 = which (1981 = 0.350%)
38 = have (1944 = 0.343%)
39 = prince (1837 = 0.324%)
40 = pierre (1767 = 0.312%)
41 = so (1756 = 0.310%)
42 = an (1613 = 0.285%)
43 = or (1570 = 0.277%)
44 = up (1540 = 0.272%)
45 = been (1474 = 0.260%)
46 = them (1465 = 0.259%)
47 = did (1465 = 0.259%)
48 = when (1453 = 0.257%)
49 = their (1434 = 0.253%)
50 = would (1347 = 0.238%)
=== Commonest word-pairs ===
1 = of the (4048 = 1.62%)
2 = to the (2307 = 0.92%)
3 = in the (2305 = 0.92%)
4 = and the (1453 = 0.58%)
5 = at the (1339 = 0.53%)
6 = on the (1291 = 0.51%)
7 = he had (1199 = 0.48%)
8 = did not (1036 = 0.41%)
9 = with a (942 = 0.37%)
10 = he was (906 = 0.36%)
11 = from the (868 = 0.34%)
12 = it was (829 = 0.33%)
13 = with the (800 = 0.32%)
14 = of his (793 = 0.31%)
15 = by the (766 = 0.30%)
16 = had been (749 = 0.30%)
17 = to be (737 = 0.29%)
18 = in a (736 = 0.29%)
19 = prince andrew (673 = 0.26%)
20 = that the (672 = 0.26%)
21 = that he (629 = 0.25%)
22 = for the (620 = 0.24%)
23 = of a (594 = 0.23%)
24 = in his (578 = 0.23%)
25 = the same (542 = 0.21%)
26 = all the (483 = 0.19%)
27 = could not (483 = 0.19%)
28 = to his (477 = 0.19%)
29 = with his (471 = 0.18%)
30 = as if (471 = 0.18%)
31 = the french (455 = 0.18%)
32 = into the (453 = 0.18%)
33 = and that (440 = 0.17%)
34 = as he (432 = 0.17%)
35 = and he (428 = 0.17%)
36 = who had (418 = 0.16%)
37 = the old (417 = 0.16%)
38 = to him (400 = 0.16%)
39 = it is (397 = 0.15%)
40 = said the (384 = 0.15%)
41 = and a (377 = 0.15%)
42 = on his (376 = 0.15%)
43 = one of (373 = 0.14%)
44 = up to (363 = 0.14%)
45 = the first (353 = 0.14%)
46 = was a (351 = 0.14%)
47 = seemed to (343 = 0.13%)
48 = she had (342 = 0.13%)
49 = out of (340 = 0.13%)
50 = for a (338 = 0.13%)
=== Commonest word-triplets ===
1 = he did not (213 = 0.046%)
2 = one of the (184 = 0.040%)
3 = out of the (178 = 0.038%)
4 = that he was (152 = 0.033%)
5 = as soon as (143 = 0.031%)
6 = up to the (129 = 0.028%)
7 = that it was (124 = 0.027%)
8 = he could not (120 = 0.026%)
9 = which he had (114 = 0.024%)
10 = in front of (113 = 0.024%)
11 = that he had (101 = 0.022%)
12 = he had been (100 = 0.021%)
13 = the commander in (99 = 0.021%)
14 = she did not (98 = 0.021%)
15 = in the same (92 = 0.020%)
16 = and did not (92 = 0.020%)
17 = went up to (91 = 0.019%)
18 = it seemed to (90 = 0.019%)
19 = the fact that (89 = 0.019%)
20 = did not know (88 = 0.019%)
21 = the sound of (88 = 0.019%)
22 = as he had (84 = 0.018%)
23 = for a long (83 = 0.018%)
24 = at the same (83 = 0.018%)
25 = he had not (79 = 0.017%)
26 = for the first (78 = 0.016%)
27 = of the french (78 = 0.016%)
28 = seemed to him (77 = 0.016%)
29 = went to the (76 = 0.016%)
30 = the old prince (76 = 0.016%)
31 = and at the (76 = 0.016%)
32 = there was a (75 = 0.016%)
33 = it would be (75 = 0.016%)
34 = at that moment (72 = 0.015%)
35 = and in the (70 = 0.015%)
36 = the battle of (70 = 0.015%)
37 = that had been (69 = 0.015%)
38 = who had been (69 = 0.015%)
39 = in spite of (68 = 0.014%)
40 = the middle of (66 = 0.014%)
41 = commander in chief (66 = 0.014%)
42 = what he had (65 = 0.014%)
43 = she could not (65 = 0.014%)
44 = a long time (64 = 0.013%)
45 = it was not (63 = 0.013%)
46 = the will of (62 = 0.013%)
47 = there was no (62 = 0.013%)
48 = the end of (62 = 0.013%)
49 = prince andrew was (62 = 0.013%)
50 = of the russian (61 = 0.013%)
=== Commonest word-quads ===
1 = in the middle of (52 = 0.009%)
2 = for the first time (51 = 0.009%)
3 = the commander in chief (49 = 0.009%)
4 = the middle of the (48 = 0.008%)
5 = at the same time (45 = 0.008%)
6 = for a long time (42 = 0.007%)
7 = in front of the (41 = 0.007%)
8 = it was impossible to (35 = 0.006%)
9 = the end of the (33 = 0.006%)
10 = in the midst of (32 = 0.005%)
11 = at the end of (32 = 0.005%)
12 = went up to the (32 = 0.005%)
13 = as soon as he (31 = 0.005%)
14 = it was evident that (31 = 0.005%)
15 = it seemed to him (30 = 0.005%)
16 = up and down the (29 = 0.005%)
17 = by the fact that (29 = 0.005%)
18 = as soon as the (29 = 0.005%)
19 = the battle of borodino (29 = 0.005%)
20 = he did not know (27 = 0.004%)
21 = seemed to him that (27 = 0.004%)
22 = the movement of the (26 = 0.004%)
23 = and at the same (26 = 0.004%)
24 = the other side of (26 = 0.004%)
25 = went out of the (26 = 0.004%)
26 = at the head of (25 = 0.004%)
27 = the head of the (25 = 0.004%)
28 = but as soon as (25 = 0.004%)
29 = and went to the (24 = 0.004%)
30 = the will of the (24 = 0.004%)
31 = did not wish to (24 = 0.004%)
32 = the commander of the (23 = 0.004%)
33 = did not know how (22 = 0.004%)
34 = the cause of the (22 = 0.004%)
35 = and was about to (22 = 0.004%)
36 = that he could not (22 = 0.004%)
37 = the fact that the (21 = 0.003%)
38 = the rest of the (21 = 0.003%)
39 = other side of the (20 = 0.003%)
40 = that he did not (19 = 0.003%)
41 = at the beginning of (19 = 0.003%)
42 = will of the people (19 = 0.003%)
43 = what was going on (19 = 0.003%)
44 = for a long time, (19 = 0.003%)
45 = the door of the (19 = 0.003%)
46 = so as not to (19 = 0.003%)
47 = the commander in chief's (18 = 0.003%)
48 = as if he were (18 = 0.003%)
49 = the sight of the (18 = 0.003%)
50 = as he had done (18 = 0.003%)
=== FREQUENCY ANALYSIS ===
a |######## 8.02% (108.% deviation from random)
b |# 1.35% (64.8% deviation from random)
c |## 2.39% (37.7% deviation from random)
d |#### 4.62% (20.1% deviation from random)
e |############ 12.3% (219.% deviation from random)
f |## 2.14% (44.2% deviation from random)
g |## 2.00% (47.8% deviation from random)
h |###### 6.52% (69.7% deviation from random)
i |###### 6.79% (76.5% deviation from random)
j | 0.10% (97.3% deviation from random)
k | 0.79% (79.2% deviation from random)
l |### 3.77% (1.92% deviation from random)
m |## 2.40% (37.3% deviation from random)
n |####### 7.19% (87.1% deviation from random)
o |####### 7.53% (95.9% deviation from random)
p |# 1.76% (54.1% deviation from random)
q | 0.09% (97.6% deviation from random)
r |##### 5.78% (50.4% deviation from random)
s |###### 6.36% (65.5% deviation from random)
t |######## 8.83% (129.% deviation from random)
u |## 2.55% (33.5% deviation from random)
v |# 1.05% (72.6% deviation from random)
w |## 2.31% (39.8% deviation from random)
x | 0.15% (95.8% deviation from random)
y |# 1.80% (52.9% deviation from random)
z | 0.09% (97.5% deviation from random)
Total percentage deviation from random = 1878%
Average percentage deviation from random = 72.2%