Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[builder] enable Arrow Dictionary feature flag #1064

Merged
merged 9 commits into from
Mar 31, 2024

Conversation

bkmartinjr
Copy link
Contributor

@bkmartinjr bkmartinjr commented Mar 22, 2024

This PR enables the use of Arrow dictionary (aka TileDB enum, Pandas Categorical, ...) in building the Census. Affects various string columns in the obs dataframe which contain repetitive labels, such as cell_type. Primary impact is more efficient memory use for end-user (reader) of Census obs dataframe.

Fixes #604

Other changes:

  • the ZStd compression level for the affected columns was reduced (the high setting had no significant value after this change).
  • package dependencies were updated to latest
  • Port to latest, non-deprecated, indexer in tiledbsoma

Copy link

codecov bot commented Mar 25, 2024

Codecov Report

Attention: Patch coverage is 33.33333% with 2 lines in your changes are missing coverage. Please review.

Project coverage is 81.35%. Comparing base (a5dbdef) to head (d1dde47).

Files Patch % Lines
...llxgene_census_builder/build_soma/validate_soma.py 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1064      +/-   ##
==========================================
+ Coverage   81.33%   81.35%   +0.01%     
==========================================
  Files          73       73              
  Lines        5566     5566              
==========================================
+ Hits         4527     4528       +1     
+ Misses       1039     1038       -1     
Flag Coverage Δ
unittests 81.35% <33.33%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@bkmartinjr bkmartinjr marked this pull request as ready for review March 29, 2024 03:35
Copy link
Contributor

@prathapsridharan prathapsridharan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bkmartinjr bkmartinjr merged commit 675bf9c into main Mar 31, 2024
13 checks passed
@bkmartinjr bkmartinjr deleted the bkmartinjr/enable-enums branch March 31, 2024 15:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add enumerated/categorical support
3 participants