Other changes:
- We disabled fast math to avoid invalid results (e.g., when dividing by zero).
Bug fixes:
- Fixed methods _cross_categorical, _cross_sandwich, multiply, tocsr of CategoricalMatrix and function _sandwich_cat_cat_limited_rows_cols to operate on read-only buffers as well.
New feature:
- :func:`tabmat.from_formula` now also supports any dataframe supported by narwhals.
Other changes:
- Require Python>=3.10.
Other changes:
- Restore building wheels for Intel-based macOS systems.
Other changes:
- :func:`tabmat.from_df` now avoids unnecessary copies of dense arrays, but still ensures that the results are contiguous (C or F order).
- We now use narwhals' v2 API for data frame handling.
Bug fixes:
- Fixed :meth:`CategoricalMatrix.transpose_matvec` to operate on read-only buffers as well.
- Fixed incorrect calculation of the shape of a :class:`CategoricalMatrix` when initialized with zero categories and
drop_first=True.
Bug fixes:
- Fixed a bug which caused issues when constructing tabmat matrices from existing
ModelSpecs when they contained categorical columns with all levels dropped. - We can now treat dedicated pandas string series - which are the defaults for strings since pandas 2.3 - as categoricals.
Bug fix:
- A more robust :meth:`DenseMatrix._get_col_stds` results in more accurate :meth:`StandardizedMatrix.sandwich` results.
Other changes:
- Build wheel for pypi on python 3.13.
- Build and test with python 3.13 in CI.
New feature:
- Added a new function, :func:`tabmat.from_df`, to convert any dataframe supported by narwhals into a :class:`tabmat.SplitMatrix`.
Other changes:
- Allow :class:`CategoricalMatrix` to be initialized directly with indices and categories.
- Added checks for dimension and
dtypemismatch in :meth:`MatrixBasesandwich.sandwich`.
Bug fix:
- Fixed a bug in :meth:`tabmat.CategoricalMatrix.standardize` that sometimes returned
nanvalues for the standard deviation due to numerical instability if usingnp.float32precision.
Other changes:
- Removed reference to the
.Aattribute and replaced it with.toarray(). - Add support between formulaic and pandas 3.0.
- Support pypi release for numpy 2.0
Breaking changes:
- To unify the API, :class:`DenseMatrix` does not inherit from :class:`np.ndarray` anymore. To convert a :class:`DenseMatrix` to a :class:`np.ndarray`, use :meth:`DenseMatrix.unpack`.
- Similarly, :class:`SparseMatrix` does not inherit from :class:`sps.csc_matrix` anymore. To convert a :class:`SparseMatrix` to a :class:`sps.csc_matrix`, use :meth:`SparseMatrix.unpack`.
New features:
- Added column name and term name metadata to :class:`MatrixBase` objects. These are automatically populated when initializing a :class:`MatrixBase` from a :class:`pandas.DataFrame`. In addition, they can be accessed and modified via the :attr:`MatrixBase.column_names` and :attr:`MatrixBase.term_names` properties.
- Added a formula interface for creating tabmat matrices from pandas data frames. See :func:`tabmat.from_formula` for details.
- Added support for missing values in :class:`CategoricalMatrix` by either creating a separate category for them or treating them as all-zero rows.
- Added support for handling missing categorical values in pandas data frames.
Bug fix:
- Added cython compiler directive
legacy_implicit_noexcept = Trueto fix performance regression with cython 3.
Other changes:
- Refactored the pre-commit hooks to use ruff.
- Refactored :meth:`CategoricalMatrix.transpose_matvec` to be deterministic when using OpenMP.
- Adjusted transformation to sparse format in :func:`tabmat.from_pandas` to future changes in pandas.
Other changes:
- Pypi release is now done using trusted publisher.
- Fix build and upload of
x86_64wheels on Linux.
Other changes:
- Fixed macos arm64 wheels with proper linkage.
Other changes:
- Improve the performance of
from_pandasin the case of low-cardinality categorical variables. - Require Python>=3.9 in line with NEP 29
- Build and test with Python 3.12 in CI.
- Fixed macos arm64 wheels with proper linkage.
Bug fixes:
- We fixed a bug in the dense sandwich product, which would previously segfault for very large matrices.
- Fixed the column order when initializing a
SplitMatrixfrom a list containing otherSplitMatrixobjects. - Fixed
getcolnot respecting thedrop_firstattribute of aCategoricalMatrix.
Other changes:
- Support building on architectures that are unsupported by xsimd.
Other changes:
- The C++ types have been refactored. Loop indices are now using the
Py_ssize_ttype. Integers now have a templated type as well. - The documentation for
matvecandmatvec_transposehas been updated to reflect actual behavior. - Checks for dimension mismatch in
matvecandmatvec_transposearguments have been added. - Remove upper pin on xsimd.
Bug fix:
- We fixed a bug in the cross sandwich product, which would previously segfault for very large matrices.
Bug fix:
- We fixed a bug in the dense sandwich product, which would previously segfault for very large F-contiguous matrices.
Bug fix:
- We fixed a bug in the dense matrix-vector and sandwich products, which would previously segfault for very large matrices.
Bug fix:
- Fixed the loading of jemalloc in Apple Silicon wheels.
Other changes:
- Build and upload wheels for Apple Silicon.
Other changes:
- Next attempt to build wheel for PyPI without
march=native.
Other changes:
- Add Python 3.10 support to CI (remove Python 3.6).
- Build wheel for PyPI without
march=native.
New feature
- :class:`tabmat.CategoricalMatrix` now accepts a drop_first argurment. This allows the user to drop the first column of a CategoricalMatrix to avoid multicollinearity problems in unregularized models.
- :class:`tabmat.StandardizedMatrix` and :class:`tabmat.MatrixBase` now support the multiply method.
Bug fix
- Always use 64bit integers for indexing in :meth:`tabmat.ext.sparse.sparse_sandwich` to avoid segmentation faults on very wide problems.
Bug fix
- Disable the use of static TLS in the Linux wheels to avoid issues with too small TLS on some distributions.
Bug fix
- We fixed a bug in :meth:`tabmat.SplitMatrix.matvec`, where incorrect matrix vector products were computed when a
SplitMatrixdid not contain any dense components.
Other changes
- We are now specifying the run time dependencies in
setup.py, so that missing dependencies are automatically installed from PyPI when installingtabmatvia pip.
Other changes
- tabmat is now available on PyPI and will be automatically updated when a new release is published.
Bug fix
- We now support
xsimd>=8and support alternative jemalloc installations.
Bug fix
- Allow to link to alternatively suffixed jemalloc installation to work around #113 .
Bug fix
- The license was mistakenly left as proprietary. Corrected to BSD-3-Clause.
Other changes
- ReadTheDocs integration.
- CONTRIBUTING.md
- Correct pyproject.toml to work with PEP-517
Breaking changes:
- The package has been renamed to
tabmat. CELEBRATE! - The :func:`one_over_var_inf_to_val` function has been made private.
- The :func:`csc_to_split` function has been re-named to :func:`tabmat.from_csc` to match the :func:`tabmat.from_pandas` function.
- The :meth:`tabmat.MatrixBase.get_col_means` and :meth:`tabmat.MatrixBase.get_col_stds` methods have been made private.
- The :meth:`cross_sandwich` method has also been made private.
Bug fix
- :func:`StandardizedMatrix.transpose_matvec` was giving the wrong answer when the out parameter was provided. This is now fixed.
- :func:`SplitMatrix.__repr__` now calls the __repr__ method of component matrices instead of __str__.
Other changes
- Optimized the :meth:`tabmat.SparseMatrix.matvec` and :meth:`tabmat.SparseMatrix.transpose_matvec` for when
rowsandcolsare None. - Implemented :func:`CategoricalMatrix.__rmul__`
- Reorganizing the documentation and updating the text to match the current API.
- Enable indexing the rows of a
CategoricalMatrix. Previously :func:`CategoricalMatrix.__getitem__` only supported column indexing. - Allow creating a
SplitMatrixfrom a list of anyMatrixBaseobjects including anotherSplitMatrix. - Reduced memory usage in :meth:`tabmat.SplitMatrix.matvec`.
Bug fix
- In :func:`SplitMatrix.sandwich`, when a col subset was specified, incorrect output was produced if the components of the indices array were not sorted. :func:`SplitMatrix.__init__` now checks for sorted indices and maintains sorted index lists when combining matrices.
Other changes
- :func:`SplitMatrix.__init__` now filters out any empty matrices.
- :func:`StandardizedMatrix.sandwich` passes
rows=Noneandcols=Noneonwards to the underlying matrix instead of replacing them with full arrays of indices. This should improve performance slightly. - :func:`SplitMatrix.__repr__` now includes the type of the underlying matrix objects in the string output.
Bug fix
Sparse matrices now accept 64-bit indices on Windows.
Bug fix:
Split matrices now also work on Windows.
Breaking changes:
We renamed several public functions to make them private. These include functions in :mod:`tabmat.benchmark` that are unlikely to be used outside of this package as well as
Other changes:
- We removed the dependency on
sparse_dot_mkl. We now use :func:`scipy.sparse.csr_matvec` instead of :func:`sparse_dot_mkl.dot_product_mkl` on all platforms, because the former suffered from poor performance, especially on narrow problems. This also means that we removed the function :func:`tabmat.sparse_matrix._dot_product_maybe_mkl`. - We updated the pre-commit hooks and made sure the code is line with the new hooks.
Other changes:
We are now also making releases for Windows.
Other changes:
Still trying.
Other changes:
We are trying to make releases for Windows.
Bug fixes:
- Added a check that matrices are two-dimensional in the
SplitMatrix.__init__ - Replace
np.intwithnp.int64where appropriate due to NumPy deprecation ofnp.int.
Other changes:
- Added Python 3.9 support.
- Use
scipy.sparsedot product when MKL isn't available.
Bug fixes:
- Handling for nulls when setting up a
CategoricalMatrix - Fixes to make several functions work with both row and col restrictions and out
Other changes:
- Added various tests and documentation improvements
Breaking change:
- Rename dot to matvec. Our dot function supports matrix-vector multiplication for every subclass, but only supports matrix-matrix multiplication for some. We therefore rename it to matvec in line with other libraries.
Bug fix:
- Fix a bug in matvec for categorical components when the number of categories exceeds the number of rows.
See git history.