You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Tokenizers are now tf.Modules and can be saved from within Keras layers.
Bug Fixes and Other Changes
Allow wordpiece_tokenizer to output int32 tokens natively.
Tracks the Sentencepiece model resource via a TrackableResource.
oss-segmenter:
fix end-offset error in split_merge_tokenizer_kernel.
TensorFlow text python ops wordshape:
More comprehensive emoji handling
Other:
Unref lookup_table in wordpiece_kernel fixing a possible memory leak.
Add missing LICENSE file for third_party/tensorflow_text/core/kernels.
add normalize kernals test
Fix Sentencepiece tests.
Add some metric logs to tokenizers.
Fix documentation formatting for SplitMergeTokenizer
Bug fix: make sure tokenize() method does not ignore itself.
Improve logging efficiency.
Update tf.text's regression test model for model server. Without the asserts, errors are erroneously swallowed by tensorflow. I also added tf.unicode_script test just to ensure that ICU is working correctly from within model server.
Add the ability to define a user-defined destination directory to make testing easier.
Fix typo in documentation of BertTokenizer
Clarify docstring of UnicodeScriptTokenizer about splitting on space
Add executable flag to the run_build.sh script.
Clarify docstring of WordpieceTokenizer on unknown_token:
Update protobuf library and point HEAD to build on tf 2.3.0-rc0