2.4.0-b0
Pre-release
Pre-release
Release 2.4.0-b0
Please note that this is a pre-release and meant to run with TF v2.3.x. We wanted to give access to some of the features we were adding to 2.4.x, but did not want to wait for the TF release.
Major Features and Improvements
- Released our first TF Hub module for Chinese segmentation! Please visit the hub module page here for more info including instructions on how to use the model.
- Added
Spliter
/SplitterWithOffsets
abstract base classes. These are meant to replace the currentTokenizer
/TokenizerWithOffsets
base classes. TheTokenizer
base classes will continue to work and will implement these newSplitter
base classes. The reasoning behind the change is to prevent confusion when future splitting operations that also use this interface do not tokenize into words (sentences, subwords, etc). - With this cleanup of terminology, we've also updated the documentation and internal variable names for token offsets to use "end" instead of "limit". This is purely a documentation change and doesn't affect any current APIs, but we feel it more clearly expresses that
offset_end
is a positional value rather than a length. - Added new
HubModuleSplitter
that helps handle ragged tensor input and outputs for hub modules which implement the Splitter class. - Added new
SplitMergeFromLogitsTokenizer
which is a narrowly focused tokenizer that splits text based on logits from a model. This is used with the newly released Chinese segmentation model.
Bug Fixes and Other Changes
- Test cleanup - use assertAllEqual(expected, actual), instead of (actual, expected), for better error messages.
- Add dep on tensorflow_hub in pip_package/setup.py
- Add filegroup BUILD target for test_data segmentation Hub module.
- Extend documentation for class HubModuleSplitter.
- Read SP model file in bytes mode in tests.