Skip to content

Conversation

@ablaom
Copy link
Member

@ablaom ablaom commented Sep 2, 2025

Context: JuliaAI/MLJModels.jl#591

For the release notes:

  • (new models) Add the following models from MLJTransforms.jl and make them immediately available to the MLJ user (she does not need to use @load to load them): OrdinalEncoder, FrequencyEncoder, TargetEncoder, ContrastEncoder, CardinalityReducer, MissingnessEncoder.
  • (mildly breaking) Have MLJTransforms.jl, instead of MLJModels.jl, provide the following built-in models, whose behaviour is unchanged: ContinuousEncoder, FillImputer, InteractionTransformer, OneHotEncoder, Standardizer, UnivariateBoxCoxTransformer, UnivariateDiscretizer, UnivariateFillImputer, UnivariateTimeTypeToContinuous, Standardizer.

Guide for possible source of breakage: While it was never necessary to use @load to load one of the models in the last list (assuming you have first run using MLJ) this is frequently not realised by users, and one sees things like @load OneHotEncoder pkg=MLJModels, which this release will break. If such a call is preceded by using MLJ or using MLJTransforms you can remove the loading command altogether (OneHotEncoder() already works), and in any case you can instead use @load OneHotEncoder pkg=MLJTransforms.

This PR includes some documentation cleanup and closes #1162 and #1173.

Waiting on:

  • local check of integration tests

@ablaom ablaom marked this pull request as draft September 2, 2025 03:09
@ablaom ablaom marked this pull request as ready for review September 2, 2025 07:19
@ablaom ablaom added the breaking label Sep 2, 2025
[MLJModels.jl](https://github.com/JuliaAI/MLJModels.jl) (built-in) | - | ConstantClassifier, ConstantRegressor, ContinuousEncoder, DeterministicConstantClassifier, DeterministicConstantRegressor, FillImputer, InteractionTransformer, OneHotEncoder, Standardizer, UnivariateBoxCoxTransformer, UnivariateDiscretizer, UnivariateFillImputer, UnivariateTimeTypeToContinuous, Standardizer, BinaryThreshholdPredictor | medium |
[MLJModels.jl](https://github.com/JuliaAI/MLJModels.jl) (built-in) | - | ConstantClassifier, ConstantRegressor, DeterministicConstantClassifier, DeterministicConstantRegressor, BinaryThreshholdPredictor | mature |
[MLJText.jl](https://github.com/JuliaAI/MLJText.jl) | - | TfidfTransformer, BM25Transformer, CountTransformer | low |
[MLJTransforms.jl](https://github.com/JuliaAI/MLJTransforms.jl) (built-in) | - | ContinuousEncoder, FillImputer, InteractionTransformer, OneHotEncoder, Standardizer, UnivariateBoxCoxTransformer, UnivariateDiscretizer, UnivariateFillImputer, UnivariateTimeTypeToContinuous, Standardizer, OrdinalEncoder, FrequencyEncoder, TargetEncoder, ContrastEncoder, CardinalityReducer, MissingnessEncoder | medium |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[MLJTransforms.jl](https://github.com/JuliaAI/MLJTransforms.jl) (built-in) | - | ContinuousEncoder, FillImputer, InteractionTransformer, OneHotEncoder, Standardizer, UnivariateBoxCoxTransformer, UnivariateDiscretizer, UnivariateFillImputer, UnivariateTimeTypeToContinuous, Standardizer, OrdinalEncoder, FrequencyEncoder, TargetEncoder, ContrastEncoder, CardinalityReducer, MissingnessEncoder | medium |
[MLJTransforms.jl](https://github.com/JuliaAI/MLJTransforms.jl) (built-in) | - | ContinuousEncoder, FillImputer, InteractionTransformer, OneHotEncoder, Standardizer, UnivariateBoxCoxTransformer, UnivariateDiscretizer, UnivariateFillImputer, UnivariateTimeTypeToContinuous, UnivariateStandardizer, OrdinalEncoder, FrequencyEncoder, TargetEncoder, ContrastEncoder, CardinalityReducer, MissingnessEncoder | medium |

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The omission is intentional. UnvariateStandardizer is effectively deprecated, as you can use Standardizer for univariate input as well. We cannot remove the model for now, as Standardizer uses it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay but there is another Standardizer entry so just remove UnivariateStandardizer.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[MLJTransforms.jl](https://github.com/JuliaAI/MLJTransforms.jl) (built-in) | - | ContinuousEncoder, FillImputer, InteractionTransformer, OneHotEncoder, Standardizer, UnivariateBoxCoxTransformer, UnivariateDiscretizer, UnivariateFillImputer, UnivariateTimeTypeToContinuous, Standardizer, OrdinalEncoder, FrequencyEncoder, TargetEncoder, ContrastEncoder, CardinalityReducer, MissingnessEncoder | medium |
[MLJTransforms.jl](https://github.com/JuliaAI/MLJTransforms.jl) (built-in) | - | ContinuousEncoder, FillImputer, InteractionTransformer, OneHotEncoder, Standardizer, UnivariateBoxCoxTransformer, UnivariateDiscretizer, UnivariateFillImputer, UnivariateTimeTypeToContinuous, OrdinalEncoder, FrequencyEncoder, TargetEncoder, ContrastEncoder, CardinalityReducer, MissingnessEncoder | medium |

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ablaom just in case you haven't seen this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right. There's a duplicate. Removed.

@EssamWisam
Copy link
Collaborator

I think that's all thank you so much.

@EssamWisam EssamWisam self-assigned this Sep 7, 2025
UnivariateBoxCoxTransformer_MLJTransforms = ["encoders"]
UnivariateDiscretizer_MLJTransforms = ["encoders"]
UnivariateFillImputer_MLJTransforms = ["missing_value_imputation"]
UnivariateStandardizer_MLJTransforms = ["encoders"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If UnivariateStandardizer is deprecated then do we still need it here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because while it is deprecated, it will still appear when the doc generating tools scrape the package code. I don't want to have to special case it. The docstring states explicitly that you can just use Standardizer. I don't see any harm.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I was just pointing out in case you missed it. Thank you for clarifying.

ConstantClassifier_MLJModels = ["classification"]
ConstantRegressor_MLJModels = ["regression"]
ContinuousEncoder_MLJTransforms = ["encoders"]
ContrastEncoder_MLJTransforms = ["encoders"]
Copy link
Collaborator

@EssamWisam EssamWisam Sep 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it exists and includes many models inside? Or I could be misunderstanding.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. I removed the wrong one.

Fixed just now.

@ablaom
Copy link
Member Author

ablaom commented Sep 9, 2025

@EssamWisam If you are happy now, please formally approve the PR.

@EssamWisam EssamWisam merged commit f79afd0 into dev Sep 9, 2025
3 checks passed
@ablaom ablaom mentioned this pull request Sep 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Document RecursiveFeatureElimination

4 participants