Skip to content

2.4.0-b0

Pre-release
Pre-release
Compare
Choose a tag to compare
@broken broken released this 23 Oct 00:49

Release 2.4.0-b0

Please note that this is a pre-release and meant to run with TF v2.3.x. We wanted to give access to some of the features we were adding to 2.4.x, but did not want to wait for the TF release.

Major Features and Improvements

  • Released our first TF Hub module for Chinese segmentation! Please visit the hub module page here for more info including instructions on how to use the model.
  • Added Spliter / SplitterWithOffsets abstract base classes. These are meant to replace the current Tokenizer / TokenizerWithOffsets base classes. The Tokenizer base classes will continue to work and will implement these new Splitter base classes. The reasoning behind the change is to prevent confusion when future splitting operations that also use this interface do not tokenize into words (sentences, subwords, etc).
  • With this cleanup of terminology, we've also updated the documentation and internal variable names for token offsets to use "end" instead of "limit". This is purely a documentation change and doesn't affect any current APIs, but we feel it more clearly expresses that offset_end is a positional value rather than a length.
  • Added new HubModuleSplitter that helps handle ragged tensor input and outputs for hub modules which implement the Splitter class.
  • Added new SplitMergeFromLogitsTokenizer which is a narrowly focused tokenizer that splits text based on logits from a model. This is used with the newly released Chinese segmentation model.

Bug Fixes and Other Changes

  • Test cleanup - use assertAllEqual(expected, actual), instead of (actual, expected), for better error messages.
  • Add dep on tensorflow_hub in pip_package/setup.py
  • Add filegroup BUILD target for test_data segmentation Hub module.
  • Extend documentation for class HubModuleSplitter.
  • Read SP model file in bytes mode in tests.

Thanks to our Contributors