Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stdatamodels treatment of units #240

Open
braingram opened this issue Dec 6, 2023 · 3 comments
Open

stdatamodels treatment of units #240

braingram opened this issue Dec 6, 2023 · 3 comments
Assignees

Comments

@braingram
Copy link
Collaborator

FITS BinTable extensions support using TUNIT keywords to define units for columns within the table. The use of these keywords can be abstracted using interfaces like FITS_rec which provide access to these units via the columns attribute (a ColDefs instance).

Units are used in jwst code and what follows are a few examples (and by no means an exhaustive list):

  1. Extract1D uses the FITS_rec interface to assign units to columns in a table.
  2. the miri pathloss schema assigns units to the TUNIT headers directly
  3. the miri pathloss reference file contains TUNIT headers for the PATHLOSS BinTable
  4. the mastargacq.schema also contains 'unit' entries in the datatype which appear to do nothing
  5. the niswfss_apcorr schema defines the column unit separate from the table dtype (and doesn't use TUNIT)

There are some considerations when examining how the pipeline uses units:

compatibility with fits and asdf formats

As datamodels should be saveable in both fits and ASDF formats the use of TUNIT for saving a unit has some issues.

  • any TUNIT fits_keyword definition in the schema is ignored when writing an ASDF file
  • all FITS_rec instances will be converted to structured arrays prior to writing to an ASDF file (losing any units defined in the columns)

The last option above (the niswfss_apcorr example) should work for both ASDF and FITS files (in the context of the jwst pipeline). However, opening the table directly in astropy (or some other FITS supporting program) will fail to associate units with the table columns as they are not using the standard TUNIT keyword(s) (in this case using SIZEUNIT).

attribute interface and model state

Depending on the state of the attribute that contains the table, stdatamodels doesn't appear to provide a consistent interface to the table or units. Some initial testing shows:

  • when a new model is created and the attribute is accessed DataModel will initialize the table using the datatype defined in the schema. This returns a np.ndarray instance with a structured datatype
  • when the attribute is assigned to (even just assigning the attribute to itself for a new model) the data will be _cast to the datatype defined in the schema. As the table has a structured datatype after the cast it will be converted to a FITS_rec
  • when read from a FITS file, the FITS_rec containing the table is _cast and ends up as a FITS_rec after the cast (although the process currently strips the units due to needing to convert the data endinaness to native/little)
  • when read from an ASDF file, there's a similar _cast but in this case (since the input to the _cast was not a FITS_rec) the result in a np.ndarray with a structured datatype
@braingram
Copy link
Collaborator Author

@jemorrison

@braingram
Copy link
Collaborator Author

braingram commented Dec 7, 2023

I opened a test PR with a modified SpecModel schema to add dynamic units to the spec_table.
#243

This works by:

  • defining a spec_table_units attribute (note that this must be defined AFTER the table so astropy does not clobber the TUNIT keywords written when saving the attribute)
  • mapping the spec_table_units attribute contents to the spec_table column TUNIT keywords

A test was added to:

  • illustrate how using the spec_table_units attribute allows modification of the TUNIT keywords by an external program while retaining the state on loading the file as a datamodel (although the test does not show this, if an external program adds a non-existent TUNIT it will load in the spec_table_units when the datamodel is opened). On read, stdatamodels will prefer TUNIT over the contents of spec_table_units in the tree.
  • illustrate how this strategy will require that the unit definition within the pipeline occur via the spec_table_units attribute (instead of spec_table.columns['WAVELENGTH'].unit, see the failing test). This is required to keep the tree in sync with the fits headers. On write stdatamodels will prefer spec_table_units over any unit in spec_table.columns.

This strategy is only necessary for dynamic units (or units where we expect the user might change the unit outside the pipeline). For static units, defining them in the schema is much simpler.

Aside from the changes in #243 the test PR has only test and schema changes (no code changes appear to be necessary to make this strategy work however it might be nice to investigate how to avoid needing to define the unit attribute after the table to allow the schemas to be a bit more flexible).

@braingram
Copy link
Collaborator Author

XREF: spacetelescope/jwst#2869

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants