Skip to content

0.2.0

Choose a tag to compare

@achoum achoum released this 01 Nov 16:54
· 552 commits to main since this release

Features

  • Add advanced option predict_single_probability_for_binary_classification
    to generate prediction tensors of shape [batch_size, 2] for binary
    classification model.
  • Add support for weighted training.
  • Add support for permutation variable importance in the GBT learner with the
    compute_permutation_variable_importance parameter.
  • Support for tf.int8 and tf.int16 values.
  • Support for distributed gradient boosted trees learning. Currently, the TF
    ParameterServerStrategy distribution strategy is only available in
    monolithic TF-DF builds. The Yggdrasil Decision Forest GRPC distribute
    strategy can be used instead.
  • Support for training from dataset stored on disk in CSV and RecordIO format
    (instead of creating a tensorflow dataset). This option is currently more
    efficient for distributed training (until the ParameterServerStrategy
    support per-worker datasets).
  • Add max_vocab_count argument to the model constructor. The existing
    max_vocab_count argument in FeatureUsage objects take precedence.

Fixes

  • Missing filtering of unique values in the categorical-set training feature
    accumulator. Was responsible for a small (e.g. ~0.5% on SST2 dataset) drop
    of accuracy compared to the C++ API.
  • Fix broken support for max_vocab_count in a FeatureUsage with type
    CATEGORICAL_SET.