Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JP-3304: Add error to NIRSpec flat_table #183

Merged
merged 14 commits into from
Sep 11, 2023

Conversation

tapastro
Copy link
Collaborator

@tapastro tapastro commented Jul 10, 2023

Resolves JP-3304

This PR addresses a missing schema table column needed for the NIRSpec team to define the flat variance arrays. I also removed some strange flat_table shape definitions, which were initializing arrays of size 130000, for which in an example I viewed I saw all but the first 1446 entries were zero.

Note that if this is merged, no existing reference files will pass validation - I don't think this can be avoided, unless a schema guru can figure it out for me. This probably should stay out of a release until NIRSpec has delivered updated flats.

Checklist

  • added entry in CHANGES.rst (either in Bug Fixes or Changes to API)
  • updated relevant tests
  • updated relevant documentation
  • updated relevant milestone(s)
  • added relevant label(s)

@tapastro tapastro added enhancement New feature or request datamodels labels Jul 10, 2023
@tapastro tapastro requested a review from hbushouse July 10, 2023 22:05
@tapastro tapastro requested a review from a team as a code owner July 10, 2023 22:05
@codecov
Copy link

codecov bot commented Jul 10, 2023

Codecov Report

Patch coverage is 100.00% of modified lines.

Files Changed Coverage
src/stdatamodels/jwst/datamodels/nirspec_flat.py 100.00%
src/stdatamodels/model_base.py 100.00%

📢 Thoughts on this report? Let us know!.

@hbushouse
Copy link
Collaborator

Old NIRSpec flat ref files may no longer pass validation once this is merged, but will they at least still be loadable into a data model? Or will it crash due to the missing table column?

@hbushouse
Copy link
Collaborator

P.S. The "shape" definitions were there because years ago the models could not support table columns with variable length/size. So they were all hardwired to a (ridiculously) large value to cover all possibilities. Yes, this led to a lot of zero padding in the ref files themselves. The models are smarter and more flexible now.

Copy link
Collaborator

@hbushouse hbushouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look fine, but I agree that we should hold off on merging this until the new flats are actually available. And I don't want to merge it into the final B9.3 release, in case it breaks something.

@braingram
Copy link
Collaborator

Following up on offline conversation with @tapastro I'm not sure if there is a way to do this that preserves backwards compatibility of the schema. Attempts to use 'oneOf' and other schema combiners failed because these combiners do not appear to be fully supported during the fits/schema loading/saving in fits_support.

I'm not seeing an easy way to add support for this (I hope I'm overlooking something).

Versioning the datamodel and schema seems like one option but I didn't immediately see another example of this in this package. Is there an example/precedent for how to handle this?

It should be possible to open the old files by setting cast_fits_arrays to False when the model is created but that would produce a model with FITS_rec entries for what are usually numpy recarrays.

@braingram
Copy link
Collaborator

It occurred to me that there might be hooks in DataModel or similar that could be used to automatically migrate old files to the new format on open. That didn't quite pan out but intercepting the fits on __init__ seems to work (tested locally with a file saved without this PR then loaded with this PR).

What about the following?

Before the call to super here:

def __init__(self, init=None, **kwargs):
super(NirspecFlatModel, self).__init__(init=init, **kwargs)

the fits table for the old files could be updated, filling it with the missing error column. Something like the following

        if 'FAST_VARIATION' in init:
            # check that table has the required columns
            # for older files they might be missing an 'err' column
            table_data = init['FAST_VARIATION'].data
            if ('error', '>f4') not in table_data.dtype.descr:
                err = numpy.empty(table_data.shape[0], dtype=[('error', '>f4')])
                err[:] = numpy.nan
                table_data = merge_arrays((table_data, err), flatten=True)
                init['FAST_VARIATION'].data = table_data

@hbushouse
Copy link
Collaborator

It occurred to me that there might be hooks in DataModel or similar that could be used to automatically migrate old files to the new format on open. That didn't quite pan out but intercepting the fits on __init__ seems to work (tested locally with a file saved without this PR then loaded with this PR).

What about the following?

Before the call to super here:

def __init__(self, init=None, **kwargs):
super(NirspecFlatModel, self).__init__(init=init, **kwargs)

the fits table for the old files could be updated, filling it with the missing error column. Something like the following

        if 'FAST_VARIATION' in init:
            # check that table has the required columns
            # for older files they might be missing an 'err' column
            table_data = init['FAST_VARIATION'].data
            if ('error', '>f4') not in table_data.dtype.descr:
                err = numpy.empty(table_data.shape[0], dtype=[('error', '>f4')])
                err[:] = numpy.nan
                table_data = merge_arrays((table_data, err), flatten=True)
                init['FAST_VARIATION'].data = table_data

@braingram This looks reasonable to me. Can you make it work? I guess that update could just be added to this PR.

@braingram
Copy link
Collaborator

@braingram This looks reasonable to me. Can you make it work? I guess that update could just be added to this PR.

I'll give it a whirl.

@braingram
Copy link
Collaborator

braingram commented Jul 31, 2023

I pushed a commit to the source branch for this PR (@tapastro let me know if you'd rather I open PRs against your branch in the future).

The commit migrates old files containing NirspecFlat and NirspecQuadFlat models by adding error columns that contain all nan. @tapastro @hbushouse does this seem like a sensible fill value and shape?

For example, using the changes in this PR, this file from CRDS can be opened:
https://jwst-crds.stsci.edu/browse/jwst_nirspec_fflat_0066.fits
and inspecting the flat table of the first (0th) quadrant results in:

In [4]: m.quadrants[0].flat_table
Out[4]:
FITS_rec([('ANY', 2419, [0.7       , 0.70023566, 0.7004714 , ..., 0.        , 0.        , 0.        ], [3.9957927e+20, 3.9931036e+20, 3.9904166e+20, ..., 0.0000000e+00, 0.0000000e+00, 0.0000000e+00], nan)],
         dtype=(numpy.record, [('slit_name', 'S15'), ('nelem', '>i4'), ('wavelength', '>f4', (130000,)), ('data', '>f4', (130000,)), ('error', '>f4')]))

Note the 'nan' in the error column and the shape of the error column ('error', '>f4') compared with the shape of for example wavelength ('wavelength', '>f4', (130000,)).

Also note that there appears to be a bug in the NirspecQuadFlatModel constructor when provided with an instance of NirspecFlatModel: #186

@braingram braingram requested a review from hbushouse July 31, 2023 18:51
@tapastro
Copy link
Collaborator Author

tapastro commented Aug 1, 2023

Hi Brett,

I would push to remove the fixed-length sizes (as in this PR code) - they're way overboard in size and unnecessary. Looking at the output, it looks as though the error column is not of the same shape as the other columns - am I misreading that? It looks like it has a single entry rather than an array.

@braingram
Copy link
Collaborator

Hi Brett,

I would push to remove the fixed-length sizes (as in this PR code) - they're way overboard in size and unnecessary. Looking at the output, it looks as though the error column is not of the same shape as the other columns - am I misreading that? It looks like it has a single entry rather than an array.

The error column is not the same shape as the other columns. Should it be?

For the CRDS file mentioned above, the other columns ('wavelength', etc) are the large fixed-length sizes. Should the error column be constructed with the same size?

It looks like there is a 'nelem' column that specifies the 'valid' values (at least looking at 'wavelength', all the values after the first 'nelem' are 0s). Presumably removing the fixed-length sizes would allow newer files to generate columns with only 'nelem' elements?

Thanks for fielding the extra questions. Updating the code to change the error column size shouldn't be an issue once we've decided on the size (and preferred value).

@braingram
Copy link
Collaborator

As @hbushouse mentioned in JP-3304 the changes in this PR could be revisited using the new alllow_extra_columns feature added in: #189

@tapastro would you rather this PR be updated to used the allow_extra_columns feature or are the existing changes preferred. If sticking with the existing changes, should the error column shape match the shape of one of the other columns?

@hbushouse
Copy link
Collaborator

@tapastro would you rather this PR be updated to used the allow_extra_columns feature or are the existing changes preferred. If sticking with the existing changes, should the error column shape match the shape of one of the other columns?

The "shape" specifications are no longer relevant and could be removed for all column definitions.

@braingram
Copy link
Collaborator

@tapastro would you rather this PR be updated to used the allow_extra_columns feature or are the existing changes preferred. If sticking with the existing changes, should the error column shape match the shape of one of the other columns?

The "shape" specifications are no longer relevant and could be removed for all column definitions.

Thanks @hbushouse. To make sure I understand.

The 'shape' property from this schema should be removed (the large numbers added at some earlier time). For example this one should be removed:

For the changes I added to this PR (in commit: 68c80fc) the NirspecFlatModel creates an error column if one does not exist in a file (when opened). The dtype used for this column is ('error', '>f4') (note the lack of shape) as seen here:
68c80fc#diff-b4e00b27991aa0396918bb645b94550466ee4d963ab6be337490c4527edeb1adR33

Should the added error column have a dtype with a shape? If so, should the shape be determined from another column (if so which one)?

Copy link
Collaborator

@braingram braingram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the CI errors it looks like the changes in my PR against your branch failed to check that init was a HDUList before attempting to migrate the table. I think the changes suggested here should fix the TypeError: 'NoneType' object is not iterable errors (as tested locally).

src/stdatamodels/jwst/datamodels/nirspec_flat.py Outdated Show resolved Hide resolved
src/stdatamodels/jwst/datamodels/nirspec_flat.py Outdated Show resolved Hide resolved
@tapastro
Copy link
Collaborator Author

tapastro commented Sep 8, 2023

@braingram Thanks for checking on the test fix! I'm wrapped up in trying to wrangle the array merge - the desired behavior is for the error column to match the shape of the wavelength and flux columns. As I have it in the PR currently, I get an error implying that a modification of the HDUList's table is not allowed due to a mismatch in ColDefs, while if I return the flatten argument to the recfunctions.merge_arrays(), I get a different error. I'll reproduce them here:

Error 1:
Traceback (most recent call last): File "/Users/tpauly/work/jp-3304/test_method.py", line 28, in <module> hdu2 = migrate_fast_variation_table(hdu1) File "/Users/tpauly/work/jp-3304/test_method.py", line 22, in migrate_fast_variation_table ext.data = table_data File "/Users/tpauly/miniconda3/envs/stdatamodels/lib/python3.10/site-packages/astropy/utils/decorators.py", line 852, in __set__ ret = self.fset(obj, val) File "/Users/tpauly/miniconda3/envs/stdatamodels/lib/python3.10/site-packages/astropy/io/fits/hdu/table.py", line 470, in data data = data.view(self._data_type) File "/Users/tpauly/miniconda3/envs/stdatamodels/lib/python3.10/site-packages/astropy/io/fits/fitsrec.py", line 263, in __array_finalize__ self._coldefs = ColDefs(self) File "/Users/tpauly/miniconda3/envs/stdatamodels/lib/python3.10/site-packages/astropy/io/fits/column.py", line 1516, in __init__ self._init_from_array(input) File "/Users/tpauly/miniconda3/envs/stdatamodels/lib/python3.10/site-packages/astropy/io/fits/column.py", line 1552, in _init_from_array format = self._col_format_cls.from_recformat(ftype) File "/Users/tpauly/miniconda3/envs/stdatamodels/lib/python3.10/site-packages/astropy/io/fits/column.py", line 324, in from_recformat return cls(_convert_format(recformat, reverse=True)) File "/Users/tpauly/miniconda3/envs/stdatamodels/lib/python3.10/site-packages/astropy/io/fits/column.py", line 2569, in _convert_format return _convert_record2fits(format) File "/Users/tpauly/miniconda3/envs/stdatamodels/lib/python3.10/site-packages/astropy/io/fits/column.py", line 2532, in _convert_record2fits raise ValueError(f"Illegal format {format}.") ValueError: Illegal format (numpy.record, [('slit_name', 'S15'), ('nelem', '>i4'), ('wavelength', '>f4', (130000,)), ('data', '>f4', (130000,))]).

Error 2:
Traceback (most recent call last): File "/Users/tpauly/work/jp-3304/test_method.py", line 28, in <module> hdu2 = migrate_fast_variation_table(hdu1) File "/Users/tpauly/work/jp-3304/test_method.py", line 21, in migrate_fast_variation_table table_data = merge_arrays((table_data, err_column), flatten=True) File "<__array_function__ internals>", line 200, in merge_arrays File "/Users/tpauly/miniconda3/envs/stdatamodels/lib/python3.10/site-packages/numpy/lib/recfunctions.py", line 489, in merge_arrays output = np.fromiter(tuple(_izip_records(seqdata, flatten=flatten)), ValueError: could not assign tuple of length 2 to structure with 5 fields.

Copy link
Collaborator

@braingram braingram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

jwst failures look like the expected ones described here: spacetelescope/jwst#7874

@hbushouse
Copy link
Collaborator

CI failure is unrelated.

@hbushouse
Copy link
Collaborator

Full regtest run, pointing to this PR branch, started at https://plwishmaster.stsci.edu:8081/job/RT/job/JWST-Developers-Pull-Requests/914

@braingram
Copy link
Collaborator

I see 6 failures in test_nirspec_lamp_fs_spec2 that all include:

     Headers contain differences:
       Keyword FXD_SLIT has different values:
          a> S200A1
          b> NONE

are these related to updated truth files for spacetelescope/jwst#7879 or a result of this PR?

@hbushouse
Copy link
Collaborator

Yes, the regtest failures are unrelated and due to having already uploaded some modified inputs for a change in jwst. Looks good.

@hbushouse hbushouse merged commit 6d155a0 into spacetelescope:master Sep 11, 2023
16 of 17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants