Skip to content
This repository was archived by the owner on Sep 25, 2025. It is now read-only.

Commit 69a30f6

Browse files
committed
Merge pull request #2 from Utkarsh352/Utkarsh352-patch-2
Update README.md
2 parents ae90c26 + ffbdbcd commit 69a30f6

File tree

1 file changed

+16
-15
lines changed

1 file changed

+16
-15
lines changed

README.md

Lines changed: 16 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
# BERT
2+
23
**\*\*\*\*\* New March 11th, 2020: Smaller BERT Models \*\*\*\*\***
34

45
This is a release of 24 smaller BERT models (English only, uncased, trained with WordPiece masking) referenced in [Well-Read Students Learn Better: On the Importance of Pre-training Compact Models](https://arxiv.org/abs/1908.08962).
@@ -78,15 +79,15 @@ the pre-processing code.
7879
In the original pre-processing code, we randomly select WordPiece tokens to
7980
mask. For example:
8081

81-
`Input Text: the man jumped up , put his basket on phil ##am ##mon ' s head`
82-
`Original Masked Input: [MASK] man [MASK] up , put his [MASK] on phil
82+
`Input Text: the man jumped up, put his basket on Phil ##am ##mon ' s head`
83+
`Original Masked Input: [MASK] man [MASK] up, put his [MASK] on Phil
8384
[MASK] ##mon ' s head`
8485

8586
The new technique is called Whole Word Masking. In this case, we always mask
86-
*all* of the the tokens corresponding to a word at once. The overall masking
87+
*all* of the tokens corresponding to a word at once. The overall masking
8788
rate remains the same.
8889

89-
`Whole Word Masked Input: the man [MASK] up , put his basket on [MASK] [MASK]
90+
`Whole Word Masked Input: the man [MASK] up, put his basket on [MASK] [MASK]
9091
[MASK] ' s head`
9192

9293
The training is identical -- we still predict each masked WordPiece token
@@ -127,10 +128,10 @@ Mongolian \*\*\*\*\***
127128

128129
We uploaded a new multilingual model which does *not* perform any normalization
129130
on the input (no lower casing, accent stripping, or Unicode normalization), and
130-
additionally inclues Thai and Mongolian.
131+
additionally includes Thai and Mongolian.
131132

132133
**It is recommended to use this version for developing multilingual models,
133-
especially on languages with non-Latin alphabets.**
134+
especially in languages with non-Latin alphabets.**
134135

135136
This does not require any code changes, and can be downloaded here:
136137

@@ -236,7 +237,7 @@ and contextual representations can further be *unidirectional* or
236237
[GloVe](https://nlp.stanford.edu/projects/glove/) generate a single "word
237238
embedding" representation for each word in the vocabulary, so `bank` would have
238239
the same representation in `bank deposit` and `river bank`. Contextual models
239-
instead generate a representation of each word that is based on the other words
240+
instead, generate a representation of each word that is based on the other words
240241
in the sentence.
241242

242243
BERT was built upon recent work in pre-training contextual representations —
@@ -270,14 +271,14 @@ and `B`, is `B` the actual next sentence that comes after `A`, or just a random
270271
sentence from the corpus?
271272

272273
```
273-
Sentence A: the man went to the store .
274-
Sentence B: he bought a gallon of milk .
274+
Sentence A: the man went to the store.
275+
Sentence B: he bought a gallon of milk.
275276
Label: IsNextSentence
276277
```
277278

278279
```
279-
Sentence A: the man went to the store .
280-
Sentence B: penguins are flightless .
280+
Sentence A: the man went to the store.
281+
Sentence B: penguins are flightless.
281282
Label: NotNextSentence
282283
```
283284

@@ -405,7 +406,7 @@ Please see the
405406
for how to use Cloud TPUs. Alternatively, you can use the Google Colab notebook
406407
"[BERT FineTuning with Cloud TPUs](https://colab.research.google.com/github/tensorflow/tpu/blob/master/tools/colab/bert_finetuning_with_cloud_tpus.ipynb)".
407408

408-
On Cloud TPUs, the pretrained model and the output directory will need to be on
409+
On Cloud TPUs, the pre-trained model and the output directory will need to be on
409410
Google Cloud Storage. For example, if you have a bucket named `some_bucket`, you
410411
might use the following flags instead:
411412

@@ -477,7 +478,7 @@ that it's running on something other than a Cloud TPU, which includes a GPU.
477478

478479
Once you have trained your classifier you can use it in inference mode by using
479480
the --do_predict=true command. You need to have a file named test.tsv in the
480-
input folder. Output will be created in file called test_results.tsv in the
481+
input folder. The output will be created in file called test_results.tsv in the
481482
output folder. Each line will contain output for each sample, columns are the
482483
class probabilities.
483484

@@ -499,7 +500,7 @@ python run_classifier.py \
499500

500501
### SQuAD 1.1
501502

502-
The Stanford Question Answering Dataset (SQuAD) is a popular question answering
503+
The Stanford Question Answering Dataset (SQuAD) is a popular question-answering
503504
benchmark dataset. BERT (at the time of the release) obtains state-of-the-art
504505
results on SQuAD with almost no task-specific network architecture modifications
505506
or data augmentation. However, it does require semi-complex data pre-processing
@@ -638,7 +639,7 @@ python $SQUAD_DIR/evaluate-v2.0.py $SQUAD_DIR/dev-v2.0.json
638639

639640
Assume the script outputs "best_f1_thresh" THRESH. (Typical values are between
640641
-1.0 and -5.0). You can now re-run the model to generate predictions with the
641-
derived threshold or alternatively you can extract the appropriate answers from
642+
derived threshold or alternatively, you can extract the appropriate answers from
642643
./squad/nbest_predictions.json.
643644

644645
```shell

0 commit comments

Comments
 (0)