Skip to content

Commit

Permalink
Adding Catalan to English models (#72)
Browse files Browse the repository at this point in the history
* Adding Catalan to English models

* Update evaluation results [skip ci]

* Update model registry [skip ci]

---------

Co-authored-by: CircleCI evaluation job <ci-models-evaluation@firefox-translations>
  • Loading branch information
andrenatal and CircleCI evaluation job authored Apr 20, 2023
1 parent c652b86 commit 89ef02f
Show file tree
Hide file tree
Showing 75 changed files with 1,262 additions and 1,038 deletions.
119 changes: 65 additions & 54 deletions evaluation/dev/bleu-results.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,57 +56,68 @@ Both absolute and relative differences in BLEU scores between Bergamot and other

## avg

| Translator/Dataset | en-ru | ru-en | en-nl | fa-en | uk-en | en-fa | is-en | nl-en | en-uk |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| bergamot | 29.44 | 33.69 | 27.30 | 28.70 | 35.93 | 17.30 | 23.40 | 29.65 | 26.30 |
| google | 34.49 (+5.05, +17.15%) | 38.20 (+4.51, +13.38%) | 29.30 (+2.00, +7.33%) | 40.85 (+12.15, +42.33%) | 42.43 (+6.50, +18.09%) | 27.80 (+10.50, +60.69%) | 38.90 (+15.50, +66.24%) | 33.05 (+3.40, +11.47%) | 32.63 (+6.33, +24.08%) |
| microsoft | 33.62 (+4.18, +14.21%) | 38.38 (+4.68, +13.90%) | 28.80 (+1.50, +5.49%) | 36.15 (+7.45, +25.96%) | 42.30 (+6.37, +17.72%) | 20.50 (+3.20, +18.50%) | 38.17 (+14.77, +63.11%) | 32.60 (+2.95, +9.95%) | 32.03 (+5.73, +21.80%) |
| Translator/Dataset | ru-en | en-nl | en-ru | en-fa | nl-en | uk-en | fa-en | ca-en | en-uk | is-en |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| bergamot | 33.69 | 27.30 | 29.44 | 17.30 | 29.65 | 35.93 | 28.70 | 38.00 | 26.30 | 23.40 |
| google | 38.20 (+4.51, +13.38%) | 29.30 (+2.00, +7.33%) | 34.49 (+5.05, +17.15%) | 27.80 (+10.50, +60.69%) | 33.05 (+3.40, +11.47%) | 42.43 (+6.50, +18.09%) | 40.85 (+12.15, +42.33%) | 48.95 (+10.95, +28.82%) | 32.63 (+6.33, +24.08%) | 38.90 (+15.50, +66.24%) |
| microsoft | 38.38 (+4.68, +13.90%) | 28.80 (+1.50, +5.49%) | 33.62 (+4.18, +14.21%) | 20.50 (+3.20, +18.50%) | 32.60 (+2.95, +9.95%) | 42.30 (+6.37, +17.72%) | 36.15 (+7.45, +25.96%) | 46.50 (+8.50, +22.37%) | 32.03 (+5.73, +21.80%) | 38.17 (+14.77, +63.11%) |

![Results](img/avg-bleu.png)
---

## en-ru

| Translator/Dataset | wmt20 | wmt13 | flores-test | flores-dev | wmt21 | wmt19 | wmt17 | wmt16 | wmt15 | wmt14 | wmt22 | wmt18 |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| bergamot | 22.00 | 26.20 | 29.20 | 29.90 | 25.50 | 31.40 | 33.60 | 30.90 | 31.40 | 38.20 | 26.50 | 28.50 |
| google | 27.20 (+5.20, +23.64%) | 28.00 (+1.80, +6.87%) | 34.40 (+5.20, +17.81%) | 34.90 (+5.00, +16.72%) | 30.00 (+4.50, +17.65%) | 32.90 (+1.50, +4.78%) | 38.90 (+5.30, +15.77%) | 35.00 (+4.10, +13.27%) | 36.90 (+5.50, +17.52%) | 45.70 (+7.50, +19.63%) | 35.00 (+8.50, +32.08%) | 35.00 (+6.50, +22.81%) |
| microsoft | 26.30 (+4.30, +19.55%) | 27.30 (+1.10, +4.20%) | 33.60 (+4.40, +15.07%) | 33.50 (+3.60, +12.04%) | 29.20 (+3.70, +14.51%) | 33.20 (+1.80, +5.73%) | 38.60 (+5.00, +14.88%) | 34.20 (+3.30, +10.68%) | 36.10 (+4.70, +14.97%) | 44.70 (+6.50, +17.02%) | 33.10 (+6.60, +24.91%) | 33.70 (+5.20, +18.25%) |

![Results](img/en-ru-bleu.png)
---

## ru-en

| Translator/Dataset | flores-dev | mtedx_test | wmt18 | wmt20 | wmt19 | wmt15 | wmt17 | wmt14 | wmt16 | wmt22 | wmt13 | flores-test | wmt21 |
| Translator/Dataset | mtedx_test | wmt19 | wmt17 | flores-dev | wmt22 | flores-test | wmt14 | wmt15 | wmt16 | wmt13 | wmt18 | wmt21 | wmt20 |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| bergamot | 31.90 | 24.00 | 31.90 | 35.00 | 39.10 | 33.50 | 37.60 | 37.80 | 33.00 | 38.50 | 29.30 | 31.00 | 35.40 |
| google | 38.40 (+6.50, +20.38%) | 25.10 (+1.10, +4.58%) | 37.30 (+5.40, +16.93%) | 38.40 (+3.40, +9.71%) | 42.80 (+3.70, +9.46%) | 38.60 (+5.10, +15.22%) | 42.70 (+5.10, +13.56%) | 42.70 (+4.90, +12.96%) | 37.60 (+4.60, +13.94%) | 43.70 (+5.20, +13.51%) | 32.20 (+2.90, +9.90%) | 37.30 (+6.30, +20.32%) | 39.80 (+4.40, +12.43%) |
| microsoft | 36.50 (+4.60, +14.42%) | 26.20 (+2.20, +9.17%) | 37.40 (+5.50, +17.24%) | 38.80 (+3.80, +10.86%) | 43.80 (+4.70, +12.02%) | 38.50 (+5.00, +14.93%) | 43.70 (+6.10, +16.22%) | 44.10 (+6.30, +16.67%) | 38.40 (+5.40, +16.36%) | 43.90 (+5.40, +14.03%) | 32.50 (+3.20, +10.92%) | 36.10 (+5.10, +16.45%) | 39.00 (+3.60, +10.17%) |
| bergamot | 24.00 | 39.10 | 37.60 | 31.90 | 38.50 | 31.00 | 37.80 | 33.50 | 33.00 | 29.30 | 31.90 | 35.40 | 35.00 |
| google | 25.10 (+1.10, +4.58%) | 42.80 (+3.70, +9.46%) | 42.70 (+5.10, +13.56%) | 38.40 (+6.50, +20.38%) | 43.70 (+5.20, +13.51%) | 37.30 (+6.30, +20.32%) | 42.70 (+4.90, +12.96%) | 38.60 (+5.10, +15.22%) | 37.60 (+4.60, +13.94%) | 32.20 (+2.90, +9.90%) | 37.30 (+5.40, +16.93%) | 39.80 (+4.40, +12.43%) | 38.40 (+3.40, +9.71%) |
| microsoft | 26.20 (+2.20, +9.17%) | 43.80 (+4.70, +12.02%) | 43.70 (+6.10, +16.22%) | 36.50 (+4.60, +14.42%) | 43.90 (+5.40, +14.03%) | 36.10 (+5.10, +16.45%) | 44.10 (+6.30, +16.67%) | 38.50 (+5.00, +14.93%) | 38.40 (+5.40, +16.36%) | 32.50 (+3.20, +10.92%) | 37.40 (+5.50, +17.24%) | 39.00 (+3.60, +10.17%) | 38.80 (+3.80, +10.86%) |

![Results](img/ru-en-bleu.png)
---

## en-nl

| Translator/Dataset | flores-test | flores-dev |
| Translator/Dataset | flores-dev | flores-test |
| --- | --- | --- |
| bergamot | 27.00 | 27.60 |
| google | 29.20 (+2.20, +8.15%) | 29.40 (+1.80, +6.52%) |
| microsoft | 28.60 (+1.60, +5.93%) | 29.00 (+1.40, +5.07%) |
| bergamot | 27.60 | 27.00 |
| google | 29.40 (+1.80, +6.52%) | 29.20 (+2.20, +8.15%) |
| microsoft | 29.00 (+1.40, +5.07%) | 28.60 (+1.60, +5.93%) |

![Results](img/en-nl-bleu.png)
---

## fa-en
## en-ru

| Translator/Dataset | wmt16 | wmt15 | flores-dev | wmt22 | wmt18 | wmt14 | wmt17 | wmt20 | wmt13 | wmt21 | wmt19 | flores-test |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| bergamot | 30.90 | 31.40 | 29.90 | 26.50 | 28.50 | 38.20 | 33.60 | 22.00 | 26.20 | 25.50 | 31.40 | 29.20 |
| google | 35.00 (+4.10, +13.27%) | 36.90 (+5.50, +17.52%) | 34.90 (+5.00, +16.72%) | 35.00 (+8.50, +32.08%) | 35.00 (+6.50, +22.81%) | 45.70 (+7.50, +19.63%) | 38.90 (+5.30, +15.77%) | 27.20 (+5.20, +23.64%) | 28.00 (+1.80, +6.87%) | 30.00 (+4.50, +17.65%) | 32.90 (+1.50, +4.78%) | 34.40 (+5.20, +17.81%) |
| microsoft | 34.20 (+3.30, +10.68%) | 36.10 (+4.70, +14.97%) | 33.50 (+3.60, +12.04%) | 33.10 (+6.60, +24.91%) | 33.70 (+5.20, +18.25%) | 44.70 (+6.50, +17.02%) | 38.60 (+5.00, +14.88%) | 26.30 (+4.30, +19.55%) | 27.30 (+1.10, +4.20%) | 29.20 (+3.70, +14.51%) | 33.20 (+1.80, +5.73%) | 33.60 (+4.40, +15.07%) |

![Results](img/en-ru-bleu.png)
---

## en-fa

| Translator/Dataset | flores-test | flores-dev |
| --- | --- | --- |
| bergamot | 17.40 | 17.20 |
| google | 28.40 (+11.00, +63.22%) | 27.20 (+10.00, +58.14%) |
| microsoft | 21.10 (+3.70, +21.26%) | 19.90 (+2.70, +15.70%) |

![Results](img/en-fa-bleu.png)
---

## nl-en

| Translator/Dataset | flores-dev | flores-test |
| --- | --- | --- |
| bergamot | 29.10 | 28.30 |
| google | 42.00 (+12.90, +44.33%) | 39.70 (+11.40, +40.28%) |
| microsoft | 36.50 (+7.40, +25.43%) | 35.80 (+7.50, +26.50%) |
| bergamot | 29.70 | 29.60 |
| google | 33.00 (+3.30, +11.11%) | 33.10 (+3.50, +11.82%) |
| microsoft | 32.40 (+2.70, +9.09%) | 32.80 (+3.20, +10.81%) |

![Results](img/fa-en-bleu.png)
![Results](img/nl-en-bleu.png)
---

## uk-en
Expand All @@ -120,46 +131,46 @@ Both absolute and relative differences in BLEU scores between Bergamot and other
![Results](img/uk-en-bleu.png)
---

## en-fa
## fa-en

| Translator/Dataset | flores-dev | flores-test |
| --- | --- | --- |
| bergamot | 17.20 | 17.40 |
| google | 27.20 (+10.00, +58.14%) | 28.40 (+11.00, +63.22%) |
| microsoft | 19.90 (+2.70, +15.70%) | 21.10 (+3.70, +21.26%) |

![Results](img/en-fa-bleu.png)
---

## is-en

| Translator/Dataset | flores-dev | flores-test | wmt21 |
| --- | --- | --- | --- |
| bergamot | 23.60 | 23.40 | 23.20 |
| google | 39.40 (+15.80, +66.95%) | 38.60 (+15.20, +64.96%) | 38.70 (+15.50, +66.81%) |
| microsoft | 37.30 (+13.70, +58.05%) | 36.70 (+13.30, +56.84%) | 40.50 (+17.30, +74.57%) |
| bergamot | 29.10 | 28.30 |
| google | 42.00 (+12.90, +44.33%) | 39.70 (+11.40, +40.28%) |
| microsoft | 36.50 (+7.40, +25.43%) | 35.80 (+7.50, +26.50%) |

![Results](img/is-en-bleu.png)
![Results](img/fa-en-bleu.png)
---

## nl-en
## ca-en

| Translator/Dataset | flores-dev | flores-test |
| --- | --- | --- |
| bergamot | 29.70 | 29.60 |
| google | 33.00 (+3.30, +11.11%) | 33.10 (+3.50, +11.82%) |
| microsoft | 32.40 (+2.70, +9.09%) | 32.80 (+3.20, +10.81%) |
| bergamot | 38.70 | 37.30 |
| google | 49.60 (+10.90, +28.17%) | 48.30 (+11.00, +29.49%) |
| microsoft | 46.80 (+8.10, +20.93%) | 46.20 (+8.90, +23.86%) |

![Results](img/nl-en-bleu.png)
![Results](img/ca-en-bleu.png)
---

## en-uk

| Translator/Dataset | flores-test | wmt22 | flores-dev |
| Translator/Dataset | flores-dev | flores-test | wmt22 |
| --- | --- | --- | --- |
| bergamot | 28.20 | 22.80 | 27.90 |
| google | 33.10 (+4.90, +17.38%) | 32.00 (+9.20, +40.35%) | 32.80 (+4.90, +17.56%) |
| microsoft | 33.50 (+5.30, +18.79%) | 30.40 (+7.60, +33.33%) | 32.20 (+4.30, +15.41%) |
| bergamot | 27.90 | 28.20 | 22.80 |
| google | 32.80 (+4.90, +17.56%) | 33.10 (+4.90, +17.38%) | 32.00 (+9.20, +40.35%) |
| microsoft | 32.20 (+4.30, +15.41%) | 33.50 (+5.30, +18.79%) | 30.40 (+7.60, +33.33%) |

![Results](img/en-uk-bleu.png)
---

## is-en

| Translator/Dataset | flores-dev | flores-test | wmt21 |
| --- | --- | --- | --- |
| bergamot | 23.60 | 23.40 | 23.20 |
| google | 39.40 (+15.80, +66.95%) | 38.60 (+15.20, +64.96%) | 38.70 (+15.50, +66.81%) |
| microsoft | 37.30 (+13.70, +58.05%) | 36.70 (+13.30, +56.84%) | 40.50 (+17.30, +74.57%) |

![Results](img/is-en-bleu.png)
---
1 change: 1 addition & 0 deletions evaluation/dev/ca-en/flores-dev.bergamot.en.bleu
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
38.7
1 change: 1 addition & 0 deletions evaluation/dev/ca-en/flores-dev.bergamot.en.comet
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
0.6699
61 changes: 61 additions & 0 deletions evaluation/dev/ca-en/flores-dev.ca-en.cometcompare
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
==========================
x_name: flores-dev.bergamot.en
y_name: flores-dev.microsoft.en

Bootstrap Resampling Results:
x-mean: 0.6700
y-mean: 0.7980
ties (%): 0.0000
x_wins (%): 0.0000
y_wins (%): 1.0000

Paired T-Test Results:
statistic: -18.8769
p_value: 0.0000
Null hypothesis rejected according to t-test.
Scores differ significantly across samples.
flores-dev.microsoft.en outperforms flores-dev.bergamot.en.
==========================
x_name: flores-dev.bergamot.en
y_name: flores-dev.google.en

Bootstrap Resampling Results:
x-mean: 0.6700
y-mean: 0.8228
ties (%): 0.0000
x_wins (%): 0.0000
y_wins (%): 1.0000

Paired T-Test Results:
statistic: -21.5915
p_value: 0.0000
Null hypothesis rejected according to t-test.
Scores differ significantly across samples.
flores-dev.google.en outperforms flores-dev.bergamot.en.
==========================
x_name: flores-dev.microsoft.en
y_name: flores-dev.google.en

Bootstrap Resampling Results:
x-mean: 0.7980
y-mean: 0.8228
ties (%): 0.0000
x_wins (%): 0.0000
y_wins (%): 1.0000

Paired T-Test Results:
statistic: -6.7390
p_value: 0.0000
Null hypothesis rejected according to t-test.
Scores differ significantly across samples.
flores-dev.google.en outperforms flores-dev.microsoft.en.

Summary
If system_x is better than system_y then:
Null hypothesis rejected according to t-test with p_value=0.05.
Scores differ significantly across samples.
system_x \ system_y flores-dev.bergamot.en flores-dev.microsoft.en flores-dev.google.en
----------------------- ------------------------ ------------------------- ----------------------
flores-dev.bergamot.en False False
flores-dev.microsoft.en True False
flores-dev.google.en True True
1 change: 1 addition & 0 deletions evaluation/dev/ca-en/flores-dev.google.en.bleu
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
49.6
1 change: 1 addition & 0 deletions evaluation/dev/ca-en/flores-dev.google.en.comet
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
0.8218
1 change: 1 addition & 0 deletions evaluation/dev/ca-en/flores-dev.microsoft.en.bleu
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
46.8
1 change: 1 addition & 0 deletions evaluation/dev/ca-en/flores-dev.microsoft.en.comet
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
0.7979
1 change: 1 addition & 0 deletions evaluation/dev/ca-en/flores-test.bergamot.en.bleu
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
37.3
1 change: 1 addition & 0 deletions evaluation/dev/ca-en/flores-test.bergamot.en.comet
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
0.6381
61 changes: 61 additions & 0 deletions evaluation/dev/ca-en/flores-test.ca-en.cometcompare
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
==========================
x_name: flores-test.bergamot.en
y_name: flores-test.microsoft.en

Bootstrap Resampling Results:
x-mean: 0.6383
y-mean: 0.7878
ties (%): 0.0000
x_wins (%): 0.0000
y_wins (%): 1.0000

Paired T-Test Results:
statistic: -17.4826
p_value: 0.0000
Null hypothesis rejected according to t-test.
Scores differ significantly across samples.
flores-test.microsoft.en outperforms flores-test.bergamot.en.
==========================
x_name: flores-test.bergamot.en
y_name: flores-test.google.en

Bootstrap Resampling Results:
x-mean: 0.6383
y-mean: 0.8105
ties (%): 0.0000
x_wins (%): 0.0000
y_wins (%): 1.0000

Paired T-Test Results:
statistic: -18.9692
p_value: 0.0000
Null hypothesis rejected according to t-test.
Scores differ significantly across samples.
flores-test.google.en outperforms flores-test.bergamot.en.
==========================
x_name: flores-test.microsoft.en
y_name: flores-test.google.en

Bootstrap Resampling Results:
x-mean: 0.7878
y-mean: 0.8105
ties (%): 0.0000
x_wins (%): 0.0000
y_wins (%): 1.0000

Paired T-Test Results:
statistic: -6.1132
p_value: 0.0000
Null hypothesis rejected according to t-test.
Scores differ significantly across samples.
flores-test.google.en outperforms flores-test.microsoft.en.

Summary
If system_x is better than system_y then:
Null hypothesis rejected according to t-test with p_value=0.05.
Scores differ significantly across samples.
system_x \ system_y flores-test.bergamot.en flores-test.microsoft.en flores-test.google.en
------------------------ ------------------------- -------------------------- -----------------------
flores-test.bergamot.en False False
flores-test.microsoft.en True False
flores-test.google.en True True
1 change: 1 addition & 0 deletions evaluation/dev/ca-en/flores-test.google.en.bleu
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
48.3
1 change: 1 addition & 0 deletions evaluation/dev/ca-en/flores-test.google.en.comet
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
0.8103
1 change: 1 addition & 0 deletions evaluation/dev/ca-en/flores-test.microsoft.en.bleu
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
46.2
1 change: 1 addition & 0 deletions evaluation/dev/ca-en/flores-test.microsoft.en.comet
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
0.7877
Loading

0 comments on commit 89ef02f

Please sign in to comment.