-
Notifications
You must be signed in to change notification settings - Fork 1
/
03-mt-russian.txt
166 lines (118 loc) · 9.94 KB
/
03-mt-russian.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
ART
https://commons.wikimedia.org/wiki/File:Gartner_Hype_Cycle.svg
SFX
SOURCES
6 rules, 49 translations, sample lexicon:
http://www.hutchinsweb.me.uk/GU-IBM-2005.pdf
Quotes on failure, trouble c translation, etc:
http://www.hutchinsweb.me.uk/MTNI-11-1995.pdf
HOW COMPUTERS FAILED AT TRANSLATING RUSSIAN
-- ASSETS PART 0 --
X art: 80's fashion sense
X art: cheesy Russian bad guy
X pic: nuclear symbol
X pic: cold war tank
X art: cold war tank guy pops out
X art: cold war tank guy blink
X art: cold war tank guy eyes wide smirky
X art: cold war tank guy winter snow
X art: cold war tank winter snow
X art: sunshine
- Go back in time
X anim: timeline 1980->1948
X art: computers a thing (mainframebot)
X art: a new thing (bingkst)
X pic: a big room-sized computer (pops under knocks)
X pic: paper punchcard
- Mr Warren Weaver
X art: weaver - basic, blink, talk, raise brows
X art: coworkers - basic, urging him
X art: soviet/Russian symbol
X text: first memo Translation
- Early days
X art: college symbol UCLA colors
X art: college symbol Georgetown colors
X art: college symbol MIT colors
X art: Bar Hillel
X text: full time machine translation man (trademark)
X text: 1954
X art: biz symbol
X art: MT demo stage + crowd
X art: MT demo stage + crowd goes wild
X text: 49 whole Russian sentences
- How demo worked
X art: phone
X text: Russian sentence, break into chunks (roots and suffixes)
X text: rule 1 rule 2...rule 6
X anim: zoom rule 1, cross out -> rule 0
X anim: word-suffix word word
X anim: 110 pops out of last word -> 21 one before -> place 21 after
X text: “Magnitude of angle is determined by the relation of length of arc to radius”
X text: “The price of potatoes is determined by the demand”
X pic: potatoes
X art: typificated Russian
X art: impressed - newspaper
X art: skeyser - basic, whoa for stunned public
X art: tech demo computer sparkles
X text: carefully selected sentences
X text: Cyrillic into the Latin alphabet
X text: only third person verbs
X text: no negations or questions
X text: relation/the relation example
X art: that demo comp is partying, times are good
-- ASSETS PART 1 --
X art: reshow Cold War computers (Russian symbol mainframe)
X art: reshow tech demo
X art: reshow crowd went wild
X pic: money starts rolling in
X art: people take sides (perfectionist)
X art: people take sides (brute forcer)
X text: perfectionists (+ one side toon highlighted, other dark)
X text: brute force (+ other toon highlighted, one side dark)
X text: bold claim made after Georgetown
X pic: public skeyeser raises eyebrow
X text: The Trouble With Translation
X text: spirit, flesh italics in phrase
X art: liquor, meat
X art: reshow newspaper
X art: turn bad (headline now frownface)
X art: reshow Bar Hillel + first...
X text: "Stolen art found by tree."
X art: stolen art found by tree
X text: harder than rocket science quote
X pic: rocket science
X pic: US government - capitol
X text: ALPAC
X text: "ALPAC, assemble!"
X pic: llama
X anim: elite group 7 experts
X art: reshow demo one last time as promises beyond wildest dreams
X AI Hype Cycle graph
X pic: snowflakes / winter chill
X text: AI Winter
X text: "we do not have useful machine translation [and] there is no immediate or predictable prospect of useful machine translation"
X art: officials accusing MT
X art: defunding bar, background color bar
Ah...the 80's. Well known for great fashion sense, bad guys with cheesy Russian accents and fears of nuclear annihilation. Maybe it's the coder in me, but I've always wondered if the Cold War would've been a bit warmer if computers could have helped us understand each other. Turns out, we tried just that.
[intro]
Welcome to CompChomp, the only show on the internets where there's something colder than a Russian winter.
Let me take you back in time...to lovely 1948. Computers were a thing. A new thing. A big, room-sized thing that you had to feed with paper full of tiny little holes. These binary-loving behemoths were so successful that folks started to wonder if there was any problem they couldn't solve.
One of those folks that definitely couldn't ignore their power was Mr Warren Weaver. (Fear the powers!) He kept bugging all of his colleagues - "you know what machines should do? They should translate Russian." And they were like, "dude, stop nagging us. Write it down." So he sat down and wrote the first memo on computer translation. It really got the ball rolling.
First, UCLA and Georgetown got on board. MIT followed in 1951 when they gave Yehoshua Bar Hillel the position of full-time professional Machine Translation man, the first in the business. Then, in 1954, IBM and Georgetown staged the “first public demo” of Machine Translation. 49 whole Russian sentences. The crowd went wild! MT was the next big thing.
How did they do it? Start with a Russian sentence and break it into dictionary chunks (words and word pieces like roots and suffixes). Then apply one of just six rules to your chunks. For example, here’s the first rule. (Wait, we’re programmers. We count from zero!) This is the ZEROTH rule! A word was associated with a code (110), and that code told the machine to go look for the last full word. When it found that last word, it checked that word for another code (21). If the word came up 21s, the machine placed it after the first word. Sounds like a whole bunch of steps, but the result is simple: take two words and switch them around.
With only a fistful of rules like this, the machine translated amazing sentences like “Magnitude of angle is determined by the relation of length of arc to radius” and “The price of potatoes is determined by the demand”. Wow, machine, really? Potatoes!? Culturally sensitive much? Well, Journalists were impressed, and the public was stunned. In a world that had never seen this kind of computer smarts before, this was magic.
But, peering behind the curtains made things feel less magical. The sentences were carefully selected. Everything was transliterated from Cyrillic into the Latin alphabet. Grammatically, sentences included third-person verbs and avoided negations or questions. Sometimes things we’d think of as grammar or syntax were stored as part of dictionary entries. That last one’s pretty cheap. Think about it: this machine associates each Russian word with up to two English equivalents. Here’s a word that translates to “relation”. But look at the second English choice - it’s “the relation”. Instead of training a computer to understand English articles, they just stored the word “the” as part of another dictionary choice.
So, yeah, they might have been cheating. (I only cheated a little bit!) But, it didn't matter - people were super excited! If you were a translating machine, times were good.
For now. Because when we return to wrap up this story, things will get ugly: nerds fighting over how computers should be translating. The public making a mockery of flawed grammar. The very existence of machine translation will hang in the balance. Chomp!
--------
Last time, Cold War computers showed some real promise at translating Russian into English. Georgetown University staged a successful tech demo, the press was impressed, and times were good for machine translation.
So the money started rolling in and people began taking sides (Round one - fight!) In this corner, you had your linguistically-minded “Perfectionists”, feeding computers all the detailed grammatical rules about each language. And in this corner - your brute force crowd. Brute Forcers were all about the evidence, setting computers loose on real sentences and training them to find patterns on their own.
Unfortunately, progress wasn't living up to the bold claims made after Georgetown. And the public started to notice. In 1962, Harper's ran an article titled “The Trouble with Translation”. With a title like that, it’s got to be fair and objective. The author recalled a demonstration of machine translation where the computer was given the scriptural nugget, "the spirit is willing but the flesh is weak". The output, we're told, was laughable: "The liquor is holding out all right, but the meat has spoiled."
I need to pause for a moment. The demo reported there didn’t actually happen. But it really illustrates how far Machine Translation had fallen. A few years earlier those same reporters were heaping praise on it...now, it was the butt of their jokes.
Scientists stepped in for damage control. Remember Bar Hillel? Our first full-time machine translation man? He argued that our expecatations were unreasonable. Natural language could be ambiguous. (Computers can't handle that!) We should step back - get machines to help us translate, not expect them to speak exactly like we do. Translating languages was more complex than even the experts could have guessed. It was literally turning out to be harder than rocket science.
In the face of souring public opinion, the US government comissioned the ALPAC (A llama?!?!? No...not a llama.) An elite group of 7 experts tasked with deciding the usefulness or the uselessness of MT and computational linguistics as a whole.
Their verdict? Oh, we’ll get to that. But first - there's something you need to understand.
Throughout the history of Artificial Intelligence, people got sold on promises beyond their wildest dreams. It happens so often that there’s a term for it: AI Hype cycle. When things go wrong, a chill sets in, a chill with lasting effects. An AI Winter. (Brrrr....No amount of jackets can protect you from that chill)
So, the verdict. The 1966 ALPAC report concluded that “we do not have useful machine translation [and] there is no immediate or predictable prospect of useful machine translation”. Computers were twice as expensive as human translators and performing so much worse. Officials were convinced. Machine translation was defunded and the first AI winter set in.
Computers and language had a rough start. But I promise it gets better....eventually. Subscribe, go eat your borsht and write some code.
Chomp!