Releases: keras-team/keras-hub
r0.4.1.dev0
Summary
- Dev release to test out the upcoming 0.4.1.
What's Changed
- Update python version in readme to 3.8 by @haifeng-jin in #618
- Modify our pip install line so we upgrade tf by @mattdangerw in #616
- Use Adam optimizer for quick start by @mattdangerw in #620
- Clean up class name and
selfin calls tosuper()by @mbrukman in #628 - Update word_piece_tokenizer.py by @ADITYADAS1999 in #617
- Add DeBERTaV3 Conversion Script by @abheesht17 in #633
- Add AlbertTokenizer and AlbertPreprocessor by @abheesht17 in #627
- Create
Backbonebase class by @jbischof in #621 - Add TPU testing by @chenmoneygithub in #591
- Add Base Preprocessor Class by @abheesht17 in #638
- Add keras_nlp.samplers by @chenmoneygithub in #563
- Add ALBERT Backbone by @abheesht17 in #622
- Add a small script to count parameters in our presets by @mattdangerw in #610
- Clean up examples/ directory by @ADITYADAS1999 in #637
- Fix Small BERT Typo by @abheesht17 in #651
- Rename examples/bert -> examples/bert_pretraining by @mattdangerw in #647
- Add FNet Preprocessor by @abheesht17 in #646
- Add FNet Backbone by @abheesht17 in #643
- Small DeBERTa Docstring Fixes by @abheesht17 in #666
- Add Fenced Docstring Testing by @abheesht17 in #640
- Corrected the epsilon value by @soma2000-lang in #665
- Consolidate docstring formatting weirdness in Backbone and Preprocessor base classes by @mattdangerw in #654
- Fix
value_diminTransformerDecoder's cross-attn layer by @abheesht17 in #667 - Add ALBERT Presets by @abheesht17 in #655
- Add Base Task Class by @abheesht17 in #671
- Implement TopP, TopK and Beam samplers by @chenmoneygithub in #652
- Add FNet Presets by @abheesht17 in #659
- Bump the year to 2023 by @mattdangerw in #679
- Add BART Backbone by @abheesht17 in #661
- Handle trainable and name in the backbone base class by @mattdangerw in #680
- Ignore Task Docstring for Testing by @abheesht17 in #683
- Light-weight benchmarking script by @NusretOzates in #664
- Conditionally import tf_text everywhere by @mattdangerw in #684
- Expose
token_embeddingas a Backbone Property by @abheesht17 in #676 - Move
from_presetto base tokenizer classes by @shivance in #673 - add f_net_classifier and f_net_classifier_test by @ADITYADAS1999 in #670
- import rouge_scorer directly from rouge_score package by @sampathweb in #691
- Fix typo in requirements file juypter -> jupyter by @mattdangerw in #693
- Temporary fix to get nightly green again by @mattdangerw in #696
- GPT2 Text Generation APIs by @chenmoneygithub in #592
- Run keras saving tests on nightly and fix RobertaClassifier test by @mattdangerw in #692
- Speed up pip install keras-nlp; simplify deps by @mattdangerw in #697
- Add
AlbertClassifierby @shivance in #668 - Make tokenizer, backbone, preprocessor properties settable on base class by @mattdangerw in #700
- Update to latest black by @mattdangerw in #708
- RobertaMaskedLM task and preprocessor by @mattdangerw in #653
- Default compilation for BERT/RoBERTa classifiers by @jbischof in #695
- Add start/end token padding to
GPT2Preprocessorby @chenmoneygithub in #704 - Don't install tf stable when building our nightly image by @mattdangerw in #711
- Add OPT Backbone and Tokenizer by @mattdangerw in #699
- Small OPT Doc-string Edits by @abheesht17 in #716
- Default compilation other classifiers by @Plutone11011 in #714
- Add BartTokenizer and BART Presets by @abheesht17 in #685
- Add an add_prefix_space Arg in BytePairTokenizer by @shivance in #715
- Opt presets by @mattdangerw in #707
- fix import of tensorflow_text in tf_utils by @sampathweb in #723
- Check for masked token in roberta tokenizer by @mattdangerw in #742
- Improve test coverage for special tokens in model tokenizers by @mattdangerw in #743
- Fix the sampler truncation strategy by @chenmoneygithub in #713
- Add ALBERT Conversion Script by @abheesht17 in #736
- Add FNet Conversion Script by @abheesht17 in #737
- Add BART Conversion Script by @abheesht17 in #739
- Pass Correct LayerNorm Epsilon value to TransformerEncoder in Backbones by @TheAthleticCoder in #731
- Improving the layer Description. by @Neeshamraghav012 in #734
- Adding ragged support to SinePositionEncoding by @apupneja in #751
- Fix trailing space by @mattdangerw in #755
- Adding an AlbertMaskedLM task model and preprocessor by @shivance in #725
- New docstring example for TokenAndPosition Embedding layer. by @Neeshamraghav012 in #760
- Add a note for TPU issues for deberta_v3 by @mattdangerw in #758
- Add missing exports to models API by @mattdangerw in #763
- Autogenerate preset table by @Cyber-Machine in #690
- Version bump to 0.5.0 by @mattdangerw in #767
New Contributors
- @haifeng-jin made their first contribution in #618
- @mbrukman made their first contribution in #628
- @soma2000-lang made their first contribution in #665
- @NusretOzates made their first contribution in #664
- @shivance made their first contribution in #673
- @Plutone11011 made their first contribution in #714
- @TheAthleticCoder made their first contribution in #731
- @Neeshamraghav012 made their first contribution in #734
- @apupneja made their first contribution in #751
- @Cyber-Machine made their first contribution in #690
Full Changelog: v0.4.0...v0.4.1.dev0
v0.4.0
The 0.4 release adds support for pretrained models to the library via keras_nlp.models. You can read an
introduction to the new API in our Getting Started Guide.
If you encounter any problems or have questions, please open an issue!
Breaking Changes
- Renamed
keras_nlp.layers.MLMHead->keras_nlp.layers.MaskedLMHead. - Renamed
keras_nlp.layers.MLMMaskGenerator->keras_nlp.layers.MaskedLMMaskGenerator. - Renamed
keras_nlp.layers.UnicodeCharacterTokenizer->keras_nlp.layers.UnicodeCodepointTokenizer. - Switched the default of
lowercaseinkeras_nlp.tokenizers.WordPieceTokenizerfromTruetoFalse. - Renamed the token id output of
MaskedLMMaskGeneratorfrom"tokens"to"tokens_ids".
Summary
- Added the
keras_nlp.modelsAPI.- Added support for BERT, DistilBERT, RoBERTa, and XLM-RoBERTa models and pretrained checkpoints.
- See our Getting Started Guide for more details.
- Added new metrics.
keras_nlp.metrics.Bleuandkeras_nlp.metrics.EditDistance.
- Added new vocabulary training utilities.
keras_nlp.tokenizers.compute_word_piece_vocabularyandkeras_nlp.tokenizers.compute_sentence_piece_proto.
- Added new preprocessing layers.
keras_nlp.layers.RandomSwapandkeras_nlp.layers.RandomDeletion.
What's Changed
- Add Edit Distance Metric by @abheesht17 in #231
- Minor fix to simplify and test handling of max_length prompts by @jbischof in #258
- Remove split regex args for WordPieceTokenizer by @mattdangerw in #255
- Add instructions on installing the latest changes by @mattdangerw in #261
- Add warning when k > vocab_size in top_k_search by @jbischof in #260
- Fix keras library imports and usage by @jbischof in #262
- Add BLEU Score by @abheesht17 in #222
- Configure GKE-based accelerator testing by @chenmoneygithub in #265
- Added WordPieceTokenizer training function by @jessechancy in #256
- Add requirements.txt for cloud build by @chenmoneygithub in #267
- Global Seed Bug Fix by @jessechancy in #269
- Update accelerator testing to use the new GCP project by @chenmoneygithub in #272
- Fixed typo: "recieved" by @ehrencrona in #273
- Reuse dense pooled output for fine tuning by @mattdangerw in #251
- Simplify BERT modeling, use keras embeddings by @mattdangerw in #253
- Rename UnicodeCharacterTokenizer>UnicodeCodepointTokenizer by @mattdangerw in #254
- Add README for accelerator testing config folder by @chenmoneygithub in #276
- Random Deletion Layer by @aflah02 in #214
- Made trainer more efficient. Loading full files instead of using TextLineDataset. by @jessechancy in #280
- Use KerasNLP for BERT preprocessing for GLUE by @mattdangerw in #252
- Minor fixes to the Random Deletion Layer by @aflah02 in #286
- Fixes for WordPieceTrainer by @aflah02 in #293
- Update default to strip_accents=False by @jessechancy in #289
- Move Bert to models folder by @jbischof in #288
- Make Decoding Functions Graph-compatible (with XLA Support!) by @abheesht17 in #271
- SentencePieceTrainer by @aflah02 in #281
- Rename
models.Bert()tomodels.BertCustom()by @jbischof in #310 - Add a test for variable sequence length inputs by @mattdangerw in #313
- Support checkpoint loading for
BertBaseby @jbischof in #299 - RoBERTa pretrained model forward pass by @jessechancy in #304
- Register objects as serializable by @mattdangerw in #292
- Style merging for Bert and Roberta by @jbischof in #315
- Streamline and speed up tests by @jbischof in #324
- Add Support for CJK Char Splitting for WordPiece Tokenizer by @abheesht17 in #318
- Clean up model input names for consistency by @mattdangerw in #327
- Return a single tensor from roberta by @mattdangerw in #328
- BERT, RoBERTa: Add
model.compileUTs by @abheesht17 in #330 - Continue rename of bert model inputs by @mattdangerw in #329
- Text Generation Utilities: Add Support for Ragged Inputs by @abheesht17 in #300
bert_base_zh,bert_base_multi_cased: Add BERT Base Variants by @abheesht17 in #319- WordPiece vocabularies trainer on Wikipedia dataset by @jessechancy in #316
- Use the exported ragged ops for RandomDeletion by @mattdangerw in #332
- Random Swap Layer by @aflah02 in #224
- Fixes for Random Deletion Layer by @aflah02 in #339
- Move cloudbuild to a hidden directory by @mattdangerw in #345
- Fix the build by @mattdangerw in #349
- Migrating from Datasets to TFDS for GLUE Example by @aflah02 in #340
- Move network_tests into keras_nlp/ by @mattdangerw in #344
- Stop hardcoding 2.9 by @mattdangerw in #351
- Add BERT Large by @abheesht17 in #331
- Add normalize_first arg to Transformer Layers by @abheesht17 in #350
- Add Small BERT Variants by @abheesht17 in #338
- Beam Search: Add Ragged and XLA Support by @abheesht17 in #341
- Fix download paths for bert weights by @mattdangerw in #356
- Add a BertPreprocessor class by @mattdangerw in #343
- Text Generation Functions: Add Benchmark Script by @abheesht17 in #342
- Improve readability for encoder/decoder blocks by @mattdangerw in #353
- Add GPT-2 Model and its Variants by @abheesht17 in #354
- Clean up BERT, RoBERTa doc-strings by @abheesht17 in #359
- Create unique string id for each BERT backbone by @jbischof in #361
- Use model.fit() for BERT Example by @abheesht17 in #360
- Minor Fixes in BertPreprocessor Layer by @abheesht17 in #373
- Clone user passed initializers called multiple times by @mattdangerw in #371
- Update BERT model file structure by @mattdangerw in #376
- Move gpt model code into a directory by @mattdangerw in #379
- Move roberta model code into a directory by @mattdangerw in #380
- Reorg test directories by @mattdangerw in #384
- Add XLM-RoBERTa by @abheesht17 in #372
- Add DistilBERT by @abheesht17 in #382
- Stop running CI on Windows by @mattdangerw in #386
- Fix Bert serialization by @mattdangerw in #385
- Improve MacOS support and pin tensorflow version during testing by @mattdangerw in #383
- Unify BERT model API in one class by @jbischof in #387
- Add
from_presetconstructor toBertPreprocessorby @jbischof in #390 - More robustly test BERT preprocessing by @mattdangerw in #394
- Move
nameandtrainabletokwargsby @jbischof in #399 - Add
backboneaspropertyfor task models by @jbischof in #398 - Set default name of
Bertinstance to"backbone"by @jbischof in #397 - Fix gpt2 serialization by @mattdangerw in https://github.com/kera...
v0.4.0.dev0
The KerasNLP 0.4 adds support for pretrained models to the API via keras_nlp.models. If you encounter any problems or have questions, please open an issue or discussion of the discussion tab!
Breaking Changes
- Renamed
keras_nlp.layers.MLMHead->keras_nlp.layers.MaskedLMHead. - Renamed
keras_nlp.layers.MLMMaskGenerator->keras_nlp.layers.MaskedLMMaskGenerator. - Renamed
keras_nlp.layers.UnicodeCharacterTokenizer->keras_nlp.layers.UnicodeCodepointTokenizer. - Switched the default of
lowercaseinkeras_nlp.tokenizers.WordPieceTokenizerfromTruetoFalse. - Renamed the token id output of
MaskedLMMaskGeneratorfrom"tokens"to"tokens_ids".
Summary
- Added the
keras_nlp.modelsAPI.- Adds support for BERT, DistilBERT, RoBERTa, and XLM-RoBERTa models and pretrained checkpoints.
- Added new metrics.
keras_nlp.metrics.Bleuandkeras_nlp.metrics.EditDistance.
- Added new vocabulary training utilities.
keras_nlp.tokenizers.compute_word_piece_vocabularyandkeras_nlp.tokenizers.compute_sentence_piece_proto.
What's Changed
- Add Edit Distance Metric by @abheesht17 in #231
- Minor fix to simplify and test handling of max_length prompts by @jbischof in #258
- Remove split regex args for WordPieceTokenizer by @mattdangerw in #255
- Add instructions on installing the latest changes by @mattdangerw in #261
- Add warning when k > vocab_size in top_k_search by @jbischof in #260
- Fix keras library imports and usage by @jbischof in #262
- Add BLEU Score by @abheesht17 in #222
- Configure GKE-based accelerator testing by @chenmoneygithub in #265
- Added WordPieceTokenizer training function by @jessechancy in #256
- Add requirements.txt for cloud build by @chenmoneygithub in #267
- Global Seed Bug Fix by @jessechancy in #269
- Update accelerator testing to use the new GCP project by @chenmoneygithub in #272
- Fixed typo: "recieved" by @ehrencrona in #273
- Reuse dense pooled output for fine tuning by @mattdangerw in #251
- Simplify BERT modeling, use keras embeddings by @mattdangerw in #253
- Rename UnicodeCharacterTokenizer>UnicodeCodepointTokenizer by @mattdangerw in #254
- Add README for accelerator testing config folder by @chenmoneygithub in #276
- Random Deletion Layer by @aflah02 in #214
- Made trainer more efficient. Loading full files instead of using TextLineDataset. by @jessechancy in #280
- Use KerasNLP for BERT preprocessing for GLUE by @mattdangerw in #252
- Minor fixes to the Random Deletion Layer by @aflah02 in #286
- Fixes for WordPieceTrainer by @aflah02 in #293
- Update default to strip_accents=False by @jessechancy in #289
- Move Bert to models folder by @jbischof in #288
- Make Decoding Functions Graph-compatible (with XLA Support!) by @abheesht17 in #271
- SentencePieceTrainer by @aflah02 in #281
- Rename
models.Bert()tomodels.BertCustom()by @jbischof in #310 - Add a test for variable sequence length inputs by @mattdangerw in #313
- Support checkpoint loading for
BertBaseby @jbischof in #299 - RoBERTa pretrained model forward pass by @jessechancy in #304
- Register objects as serializable by @mattdangerw in #292
- Style merging for Bert and Roberta by @jbischof in #315
- Streamline and speed up tests by @jbischof in #324
- Add Support for CJK Char Splitting for WordPiece Tokenizer by @abheesht17 in #318
- Clean up model input names for consistency by @mattdangerw in #327
- Return a single tensor from roberta by @mattdangerw in #328
- BERT, RoBERTa: Add
model.compileUTs by @abheesht17 in #330 - Continue rename of bert model inputs by @mattdangerw in #329
- Text Generation Utilities: Add Support for Ragged Inputs by @abheesht17 in #300
bert_base_zh,bert_base_multi_cased: Add BERT Base Variants by @abheesht17 in #319- WordPiece vocabularies trainer on Wikipedia dataset by @jessechancy in #316
- Use the exported ragged ops for RandomDeletion by @mattdangerw in #332
- Random Swap Layer by @aflah02 in #224
- Fixes for Random Deletion Layer by @aflah02 in #339
- Move cloudbuild to a hidden directory by @mattdangerw in #345
- Fix the build by @mattdangerw in #349
- Migrating from Datasets to TFDS for GLUE Example by @aflah02 in #340
- Move network_tests into keras_nlp/ by @mattdangerw in #344
- Stop hardcoding 2.9 by @mattdangerw in #351
- Add BERT Large by @abheesht17 in #331
- Add normalize_first arg to Transformer Layers by @abheesht17 in #350
- Add Small BERT Variants by @abheesht17 in #338
- Beam Search: Add Ragged and XLA Support by @abheesht17 in #341
- Fix download paths for bert weights by @mattdangerw in #356
- Add a BertPreprocessor class by @mattdangerw in #343
- Text Generation Functions: Add Benchmark Script by @abheesht17 in #342
- Improve readability for encoder/decoder blocks by @mattdangerw in #353
- Add GPT-2 Model and its Variants by @abheesht17 in #354
- Clean up BERT, RoBERTa doc-strings by @abheesht17 in #359
- Create unique string id for each BERT backbone by @jbischof in #361
- Use model.fit() for BERT Example by @abheesht17 in #360
- Minor Fixes in BertPreprocessor Layer by @abheesht17 in #373
- Clone user passed initializers called multiple times by @mattdangerw in #371
- Update BERT model file structure by @mattdangerw in #376
- Move gpt model code into a directory by @mattdangerw in #379
- Move roberta model code into a directory by @mattdangerw in #380
- Reorg test directories by @mattdangerw in #384
- Add XLM-RoBERTa by @abheesht17 in #372
- Add DistilBERT by @abheesht17 in #382
- Stop running CI on Windows by @mattdangerw in #386
- Fix Bert serialization by @mattdangerw in #385
- Improve MacOS support and pin tensorflow version during testing by @mattdangerw in #383
- Unify BERT model API in one class by @jbischof in #387
- Add
from_presetconstructor toBertPreprocessorby @jbischof in #390 - More robustly test BERT preprocessing by @mattdangerw in #394
- Move
nameandtrainabletokwargsby @jbischof in #399 - Add
backboneaspropertyfor task models by @jbischof in #398 - Set default name of
Bertinstance to"backbone"by @jbischof in #397 - Fix gpt2 serialization by @mattdangerw in #391
- Fix distilbert serialization by @mattdangerw in #392
- Fix roberta and xlm-roberta serialization by @mattdangerw in https:...
v0.3.1
Summary
- Add
keras_nlp.tokenizers.BytePairTokenizerwithtf.datafriendly support for the tokenization used by GPT-2, RoBERTa and other models. - Remove the hard dependency on
tensorflowandtensorflow-textwhen pip installing on MacOS, to accommodate M1 chips. See this section of our contributor guide for more information on MacOS development.
What's Changed
- Cherry picks 0.3 by @mattdangerw in #454
- Bump version for 0.3.1 pre release by @mattdangerw in #456
- Remove dev prefix for 0.3.1 release by @mattdangerw in #457
Full Changelog: v0.3.0...v0.3.1
v0.3.0
Summary
- Added
keras_nlp.tokenizers.SentencePieceTokenizer. - Added two token packing layers
keras_nlp.layers.StartEndPackerandkeras_nlp.layers.MultiSegmentPacker. - Added two metrics,
keras_nlp.metrics.RougeLandkeras_nlp.metrics.RougeNbased on therouge-scorepackage. - Added five utilities for generating sequences,
keras_nlp.utils.greedy_search,keras_nlp.utils.random_search,keras_nlp.utils.top_k_search,keras_nlp.utils.top_p_search,keras_nlp.utils.beam_search.
What's Changed
- Greedy text generation util by @chenmoneygithub in #154
- Remove incorrect embedding size limit by @mattdangerw in #195
- Fix inits for bert heads by @mattdangerw in #192
- Add keras.io links to README by @mattdangerw in #196
- Minor Corrections In ROADMAP.md by @saiteja13427 in #200
- Fix Loose Dependency Imports by @abheesht17 in #199
- Reorganize examples by @mattdangerw in #179
- Remove bert config arguments from README by @mattdangerw in #205
- Add checkpoints to BERT training by @chenmoneygithub in #184
- Run keras tuner from a temp directory by @mattdangerw in #202
- Token and position embedding minor fixes by @mattdangerw in #203
- Correct typo in WordPieceTokenizer by @abheesht17 in #208
- Add TPU support to BERT example by @chenmoneygithub in #207
- Remove type annotations for complex types by @mattdangerw in #194
- Issue 182: Modified TransformerDecoder with optional parameter by @jessechancy in #217
- Add StartEndPacker layer by @abheesht17 in #221
- Add a layer for packing inputs for BERT-likes by @mattdangerw in #88
- Ignore UserWarning to fix nightly testing breakage by @chenmoneygithub in #227
- Add ROUGE Metric by @abheesht17 in #122
- Allow long lines for links in docstrings by @mattdangerw in #229
- Random Sampling Util for Text Generation by @jessechancy in #228
- added top k search util by @jessechancy in #232
- top p search and testing by @jessechancy in #233
- Add a SentencePiece tokenizer by @mattdangerw in #218
- Add cloud training support for BERT example by @chenmoneygithub in #226
- Bump version to 0.3.0 for upcoming release by @mattdangerw in #239
- Add support for StartEndPacker packing 2D tensor by @jessechancy in #240
- Fixed Bug with Unicode Tokenizer Vocab Size by @aflah02 in #243
- Fixed Import for top_p_search util by @aflah02 in #245
- MultiSegmentPacker support for 2D dense tensor by @jessechancy in #244
- Minor fixes for multi-segment packer by @mattdangerw in #246
- Add beam search decoding util by @jessechancy in #237
New Contributors
- @saiteja13427 made their first contribution in #200
- @jessechancy made their first contribution in #217
Full Changelog: v0.2.0...v0.3.0
v0.2.0
Summary
- Documentation live on keras.io.
- Added two tokenizers:
ByteTokenizerandUnicodeCharacterTokenizer. - Added a
Perplexitymetric. - Added three layers
TokenAndPositionEmbedding,MLMMaskGeneratorandMLMHead. - Contributing guides and roadmap.
What's Changed
- Add Byte Tokenizer by @abheesht17 in #80
- Fixing rank 1 outputs for WordPieceTokenizer by @aflah02 in #92
- Add tokenizer accessors to the base class by @mattdangerw in #89
- Fix word piece attributes by @mattdangerw in #97
- Small fix: change assertEquals to assertEqual by @chenmoneygithub in #103
- Added a Learning Rate Schedule for the BERT Example by @Stealth-py in #96
- Add Perplexity Metric by @abheesht17 in #68
- Use the black profile for isort by @mattdangerw in #117
- Update README with release information by @mattdangerw in #118
- Add a class to generate LM masks by @chenmoneygithub in #61
- Add docstring testing by @mattdangerw in #116
- Fix broken docstring in MLMMaskGenerator by @chenmoneygithub in #121
- Adding a UnicodeCharacterTokenizer by @aflah02 in #100
- Added Class by @adhadse in #91
- Fix bert example so it is runnable by @mattdangerw in #123
- Fix the issue that MLMMaskGenerator does not work in graph mode by @chenmoneygithub in #131
- Actually use layer norm epsilon in encoder/decoder by @mattdangerw in #133
- Whitelisted formatting and lint check targets by @adhadse in #126
- Updated CONTRIBUTING.md for setup of venv and standard pip install by @adhadse in #127
- Fix mask propagation of transformer layers by @chenmoneygithub in #139
- Fix masking for TokenAndPositionEmbedding by @mattdangerw in #140
- Fixed no oov token error in vocab for WordPieceTokenizer by @adhadse in #136
- Add a MLMHead layer by @mattdangerw in #132
- Bump version for 0.2.0 dev release by @mattdangerw in #142
- Added WSL setup text to CONTRIBUTING.md by @adhadse in #144
- Add attribution for the BERT modeling code by @mattdangerw in #151
- Remove preprocessing subdir by @mattdangerw in #150
- Word piece arg change by @mattdangerw in #148
- Rename max_length to sequence_length by @mattdangerw in #149
- Don't accept a string dtype for unicode tokenizer by @mattdangerw in #147
- Adding Utility to Detokenize as list of Strings to Tokenizer Base Class by @aflah02 in #124
- Fixed Import Error by @aflah02 in #161
- Added KerasTuner Hyper-Parameter Search for the BERT fine-tuning script. by @Stealth-py in #143
- Docstring updates for upcoming doc publish by @mattdangerw in #146
- version bump for 0.2.0.dev2 pre-release by @mattdangerw in #165
- Added a vocabulary_size argument to UnicodeCharacterTokenizer by @aflah02 in #163
- Simplified utility to preview a tfrecord by @mattdangerw in #168
- Update BERT example's README with data downloading instructions by @chenmoneygithub in #169
- Add a call to repeat during pretraining by @mattdangerw in #172
- Add an integration test matching our quick start by @mattdangerw in #162
- Modify README of bert example by @chenmoneygithub in #174
- Fix the finetuning script's loss and metric config by @chenmoneygithub in #176
- Minor improvements to the position embedding docs by @mattdangerw in #180
- Update docs for upcoming 0.2.0 release by @mattdangerw in #158
- Restore accidentally deleted line from README by @mattdangerw in #185
- Bump version for 0.2.0 release by @mattdangerw in #186
- Pre release fix by @mattdangerw in #187
New Contributors
- @Stealth-py made their first contribution in #96
- @adhadse made their first contribution in #91
Full Changelog: v0.1.1...v0.2.0
v0.2.0.dev2
What's Changed
- Added WSL setup text to CONTRIBUTING.md by @adhadse in #144
- Add attribution for the BERT modeling code by @mattdangerw in #151
- Remove preprocessing subdir by @mattdangerw in #150
- Word piece arg change by @mattdangerw in #148
- Rename max_length to sequence_length by @mattdangerw in #149
- Don't accept a string dtype for unicode tokenizer by @mattdangerw in #147
- Adding Utility to Detokenize as list of Strings to Tokenizer Base Class by @aflah02 in #124
- Fixed Import Error by @aflah02 in #161
- Added KerasTuner Hyper-Parameter Search for the BERT fine-tuning script. by @Stealth-py in #143
- Docstring updates for upcoming doc publish by @mattdangerw in #146
- version bump for 0.2.0.dev2 pre-release by @mattdangerw in #165
Full Changelog: v0.2.0-dev.1...v0.2.0.dev2
v0.2.0-dev.1
What's Changed
- Add Byte Tokenizer by @abheesht17 in #80
- Fixing rank 1 outputs for WordPieceTokenizer by @aflah02 in #92
- Add tokenizer accessors to the base class by @mattdangerw in #89
- Fix word piece attributes by @mattdangerw in #97
- Small fix: change assertEquals to assertEqual by @chenmoneygithub in #103
- Added a Learning Rate Schedule for the BERT Example by @Stealth-py in #96
- Add Perplexity Metric by @abheesht17 in #68
- Use the black profile for isort by @mattdangerw in #117
- Update README with release information by @mattdangerw in #118
- Add a class to generate LM masks by @chenmoneygithub in #61
- Add docstring testing by @mattdangerw in #116
- Fix broken docstring in MLMMaskGenerator by @chenmoneygithub in #121
- Adding a UnicodeCharacterTokenizer by @aflah02 in #100
- Added TokenAndPositionEmbedding Class by @adhadse in #91
- Fix bert example so it is runnable by @mattdangerw in #123
- Fix the issue that MLMMaskGenerator does not work in graph mode by @chenmoneygithub in #131
- Actually use layer norm epsilon in encoder/decoder by @mattdangerw in #133
- Whitelisted formatting and lint check targets by @adhadse in #126
- Updated CONTRIBUTING.md for setup of venv and standard pip install by @adhadse in #127
- Fix mask propagation of transformer layers by @chenmoneygithub in #139
- Fix masking for TokenAndPositionEmbedding by @mattdangerw in #140
- Fixed no oov token error in vocab for WordPieceTokenizer by @adhadse in #136
- Add a MLMHead layer by @mattdangerw in #132
- Bump version for 0.2.0 dev release by @mattdangerw in #142
New Contributors
- @Stealth-py made their first contribution in #96
- @adhadse made their first contribution in #91
Full Changelog: v0.1.1...v0.2.0-dev.1
v0.1.1
What's Changed
- Add tokenizer helper to convert tokens to ids by @mattdangerw in #75
- Add a sinusoidal embedding layer by @amantayal44 in #59
- Add a learned positional embedding layer by @hertschuh in #47
- Fix typo in position embedding docstring by @mattdangerw in #86
- Bump version number to 0.1.1 by @mattdangerw in #90
New Contributors
- @amantayal44 made their first contribution in #59
- @hertschuh made their first contribution in #47
Full Changelog: v0.1.0...v0.1.1
v0.1.0
Initial release of keras-nlp with word piece tokenizer and transformer encoder/decoder blocks.
This is a v0 release, with no API compatibility guarantees.