JP-3304: Add error to NIRSpec flat_table #183

tapastro · 2023-07-10T22:05:34Z

This PR addresses a missing schema table column needed for the NIRSpec team to define the flat variance arrays. I also removed some strange flat_table shape definitions, which were initializing arrays of size 130000, for which in an example I viewed I saw all but the first 1446 entries were zero.

Note that if this is merged, no existing reference files will pass validation - I don't think this can be avoided, unless a schema guru can figure it out for me. This probably should stay out of a release until NIRSpec has delivered updated flats.

Checklist

added entry in CHANGES.rst (either in Bug Fixes or Changes to API)
updated relevant tests
updated relevant documentation
updated relevant milestone(s)
added relevant label(s)

codecov · 2023-07-10T22:10:36Z

Codecov Report

Patch coverage is 100.00% of modified lines.

Files Changed	Coverage
src/stdatamodels/jwst/datamodels/nirspec_flat.py	`100.00%`
src/stdatamodels/model_base.py	`100.00%`

📢 Thoughts on this report? Let us know!.

hbushouse · 2023-07-11T11:57:24Z

Old NIRSpec flat ref files may no longer pass validation once this is merged, but will they at least still be loadable into a data model? Or will it crash due to the missing table column?

hbushouse · 2023-07-11T11:59:33Z

P.S. The "shape" definitions were there because years ago the models could not support table columns with variable length/size. So they were all hardwired to a (ridiculously) large value to cover all possibilities. Yes, this led to a lot of zero padding in the ref files themselves. The models are smarter and more flexible now.

hbushouse

Changes look fine, but I agree that we should hold off on merging this until the new flats are actually available. And I don't want to merge it into the final B9.3 release, in case it breaks something.

braingram · 2023-07-18T19:12:56Z

Following up on offline conversation with @tapastro I'm not sure if there is a way to do this that preserves backwards compatibility of the schema. Attempts to use 'oneOf' and other schema combiners failed because these combiners do not appear to be fully supported during the fits/schema loading/saving in fits_support.

I'm not seeing an easy way to add support for this (I hope I'm overlooking something).

Versioning the datamodel and schema seems like one option but I didn't immediately see another example of this in this package. Is there an example/precedent for how to handle this?

It should be possible to open the old files by setting cast_fits_arrays to False when the model is created but that would produce a model with FITS_rec entries for what are usually numpy recarrays.

braingram · 2023-07-18T19:59:35Z

It occurred to me that there might be hooks in DataModel or similar that could be used to automatically migrate old files to the new format on open. That didn't quite pan out but intercepting the fits on __init__ seems to work (tested locally with a file saved without this PR then loaded with this PR).

What about the following?

Before the call to super here:

stdatamodels/src/stdatamodels/jwst/datamodels/nirspec_flat.py

Lines 35 to 36 in fbd97cc

    
           def __init__(self, init=None, **kwargs): 
        
               super(NirspecFlatModel, self).__init__(init=init, **kwargs)

the fits table for the old files could be updated, filling it with the missing error column. Something like the following

        if 'FAST_VARIATION' in init:
            # check that table has the required columns
            # for older files they might be missing an 'err' column
            table_data = init['FAST_VARIATION'].data
            if ('error', '>f4') not in table_data.dtype.descr:
                err = numpy.empty(table_data.shape[0], dtype=[('error', '>f4')])
                err[:] = numpy.nan
                table_data = merge_arrays((table_data, err), flatten=True)
                init['FAST_VARIATION'].data = table_data

hbushouse · 2023-07-31T12:44:06Z

It occurred to me that there might be hooks in DataModel or similar that could be used to automatically migrate old files to the new format on open. That didn't quite pan out but intercepting the fits on __init__ seems to work (tested locally with a file saved without this PR then loaded with this PR).

What about the following?

Before the call to super here:

stdatamodels/src/stdatamodels/jwst/datamodels/nirspec_flat.py

Lines 35 to 36 in fbd97cc

def __init__(self, init=None, **kwargs):

super(NirspecFlatModel, self).__init__(init=init, **kwargs)

the fits table for the old files could be updated, filling it with the missing error column. Something like the following
        if 'FAST_VARIATION' in init:
            # check that table has the required columns
            # for older files they might be missing an 'err' column
            table_data = init['FAST_VARIATION'].data
            if ('error', '>f4') not in table_data.dtype.descr:
                err = numpy.empty(table_data.shape[0], dtype=[('error', '>f4')])
                err[:] = numpy.nan
                table_data = merge_arrays((table_data, err), flatten=True)
                init['FAST_VARIATION'].data = table_data

@braingram This looks reasonable to me. Can you make it work? I guess that update could just be added to this PR.

braingram · 2023-07-31T12:46:22Z

@braingram This looks reasonable to me. Can you make it work? I guess that update could just be added to this PR.

I'll give it a whirl.

braingram · 2023-07-31T18:31:34Z

I pushed a commit to the source branch for this PR (@tapastro let me know if you'd rather I open PRs against your branch in the future).

The commit migrates old files containing NirspecFlat and NirspecQuadFlat models by adding error columns that contain all nan. @tapastro @hbushouse does this seem like a sensible fill value and shape?

For example, using the changes in this PR, this file from CRDS can be opened:
https://jwst-crds.stsci.edu/browse/jwst_nirspec_fflat_0066.fits
and inspecting the flat table of the first (0th) quadrant results in:

In [4]: m.quadrants[0].flat_table
Out[4]:
FITS_rec([('ANY', 2419, [0.7       , 0.70023566, 0.7004714 , ..., 0.        , 0.        , 0.        ], [3.9957927e+20, 3.9931036e+20, 3.9904166e+20, ..., 0.0000000e+00, 0.0000000e+00, 0.0000000e+00], nan)],
         dtype=(numpy.record, [('slit_name', 'S15'), ('nelem', '>i4'), ('wavelength', '>f4', (130000,)), ('data', '>f4', (130000,)), ('error', '>f4')]))

Note the 'nan' in the error column and the shape of the error column ('error', '>f4') compared with the shape of for example wavelength ('wavelength', '>f4', (130000,)).

Also note that there appears to be a bug in the NirspecQuadFlatModel constructor when provided with an instance of NirspecFlatModel: #186

tapastro · 2023-08-01T15:27:29Z

Hi Brett,

I would push to remove the fixed-length sizes (as in this PR code) - they're way overboard in size and unnecessary. Looking at the output, it looks as though the error column is not of the same shape as the other columns - am I misreading that? It looks like it has a single entry rather than an array.

braingram · 2023-08-01T15:39:14Z

Hi Brett,

I would push to remove the fixed-length sizes (as in this PR code) - they're way overboard in size and unnecessary. Looking at the output, it looks as though the error column is not of the same shape as the other columns - am I misreading that? It looks like it has a single entry rather than an array.

The error column is not the same shape as the other columns. Should it be?

For the CRDS file mentioned above, the other columns ('wavelength', etc) are the large fixed-length sizes. Should the error column be constructed with the same size?

It looks like there is a 'nelem' column that specifies the 'valid' values (at least looking at 'wavelength', all the values after the first 'nelem' are 0s). Presumably removing the fixed-length sizes would allow newer files to generate columns with only 'nelem' elements?

Thanks for fielding the extra questions. Updating the code to change the error column size shouldn't be an issue once we've decided on the size (and preferred value).

braingram · 2023-08-23T21:13:43Z

As @hbushouse mentioned in JP-3304 the changes in this PR could be revisited using the new alllow_extra_columns feature added in: #189

@tapastro would you rather this PR be updated to used the allow_extra_columns feature or are the existing changes preferred. If sticking with the existing changes, should the error column shape match the shape of one of the other columns?

hbushouse · 2023-08-24T11:50:16Z

@tapastro would you rather this PR be updated to used the allow_extra_columns feature or are the existing changes preferred. If sticking with the existing changes, should the error column shape match the shape of one of the other columns?

The "shape" specifications are no longer relevant and could be removed for all column definitions.

braingram · 2023-08-24T13:04:17Z

@tapastro would you rather this PR be updated to used the allow_extra_columns feature or are the existing changes preferred. If sticking with the existing changes, should the error column shape match the shape of one of the other columns?

The "shape" specifications are no longer relevant and could be removed for all column definitions.

Thanks @hbushouse. To make sure I understand.

The 'shape' property from this schema should be removed (the large numbers added at some earlier time). For example this one should be removed:

stdatamodels/src/stdatamodels/jwst/datamodels/schemas/nirspec_flat.schema.yaml

Line 52 in 844988d

shape: [130000]

For the changes I added to this PR (in commit: 68c80fc) the NirspecFlatModel creates an error column if one does not exist in a file (when opened). The dtype used for this column is ('error', '>f4') (note the lack of shape) as seen here:
68c80fc#diff-b4e00b27991aa0396918bb645b94550466ee4d963ab6be337490c4527edeb1adR33

Should the added error column have a dtype with a shape? If so, should the shape be determined from another column (if so which one)?

braingram

From the CI errors it looks like the changes in my PR against your branch failed to check that init was a HDUList before attempting to migrate the table. I think the changes suggested here should fix the TypeError: 'NoneType' object is not iterable errors (as tested locally).

src/stdatamodels/jwst/datamodels/nirspec_flat.py

tapastro · 2023-09-08T16:49:23Z

@braingram Thanks for checking on the test fix! I'm wrapped up in trying to wrangle the array merge - the desired behavior is for the error column to match the shape of the wavelength and flux columns. As I have it in the PR currently, I get an error implying that a modification of the HDUList's table is not allowed due to a mismatch in ColDefs, while if I return the flatten argument to the recfunctions.merge_arrays(), I get a different error. I'll reproduce them here:

Error 1:
Traceback (most recent call last): File "/Users/tpauly/work/jp-3304/test_method.py", line 28, in <module> hdu2 = migrate_fast_variation_table(hdu1) File "/Users/tpauly/work/jp-3304/test_method.py", line 22, in migrate_fast_variation_table ext.data = table_data File "/Users/tpauly/miniconda3/envs/stdatamodels/lib/python3.10/site-packages/astropy/utils/decorators.py", line 852, in __set__ ret = self.fset(obj, val) File "/Users/tpauly/miniconda3/envs/stdatamodels/lib/python3.10/site-packages/astropy/io/fits/hdu/table.py", line 470, in data data = data.view(self._data_type) File "/Users/tpauly/miniconda3/envs/stdatamodels/lib/python3.10/site-packages/astropy/io/fits/fitsrec.py", line 263, in __array_finalize__ self._coldefs = ColDefs(self) File "/Users/tpauly/miniconda3/envs/stdatamodels/lib/python3.10/site-packages/astropy/io/fits/column.py", line 1516, in __init__ self._init_from_array(input) File "/Users/tpauly/miniconda3/envs/stdatamodels/lib/python3.10/site-packages/astropy/io/fits/column.py", line 1552, in _init_from_array format = self._col_format_cls.from_recformat(ftype) File "/Users/tpauly/miniconda3/envs/stdatamodels/lib/python3.10/site-packages/astropy/io/fits/column.py", line 324, in from_recformat return cls(_convert_format(recformat, reverse=True)) File "/Users/tpauly/miniconda3/envs/stdatamodels/lib/python3.10/site-packages/astropy/io/fits/column.py", line 2569, in _convert_format return _convert_record2fits(format) File "/Users/tpauly/miniconda3/envs/stdatamodels/lib/python3.10/site-packages/astropy/io/fits/column.py", line 2532, in _convert_record2fits raise ValueError(f"Illegal format {format}.") ValueError: Illegal format (numpy.record, [('slit_name', 'S15'), ('nelem', '>i4'), ('wavelength', '>f4', (130000,)), ('data', '>f4', (130000,))]).

Error 2:
Traceback (most recent call last): File "/Users/tpauly/work/jp-3304/test_method.py", line 28, in <module> hdu2 = migrate_fast_variation_table(hdu1) File "/Users/tpauly/work/jp-3304/test_method.py", line 21, in migrate_fast_variation_table table_data = merge_arrays((table_data, err_column), flatten=True) File "<__array_function__ internals>", line 200, in merge_arrays File "/Users/tpauly/miniconda3/envs/stdatamodels/lib/python3.10/site-packages/numpy/lib/recfunctions.py", line 489, in merge_arrays output = np.fromiter(tuple(_izip_records(seqdata, flatten=flatten)), ValueError: could not assign tuple of length 2 to structure with 5 fields.

src/stdatamodels/jwst/datamodels/nirspec_flat.py

expand test_nirspec_flat_table_migration to columns with shapes

braingram

LGTM!

jwst failures look like the expected ones described here: spacetelescope/jwst#7874

hbushouse · 2023-09-11T18:50:48Z

CI failure is unrelated.

hbushouse · 2023-09-11T18:54:04Z

Full regtest run, pointing to this PR branch, started at https://plwishmaster.stsci.edu:8081/job/RT/job/JWST-Developers-Pull-Requests/914

braingram · 2023-09-11T21:21:26Z

I see 6 failures in test_nirspec_lamp_fs_spec2 that all include:

     Headers contain differences:
       Keyword FXD_SLIT has different values:
          a> S200A1
          b> NONE

are these related to updated truth files for spacetelescope/jwst#7879 or a result of this PR?

hbushouse · 2023-09-11T21:58:33Z

Yes, the regtest failures are unrelated and due to having already uploaded some modified inputs for a change in jwst. Looks good.

tapastro added enhancement New feature or request datamodels labels Jul 10, 2023

tapastro requested a review from hbushouse July 10, 2023 22:05

tapastro requested a review from a team as a code owner July 10, 2023 22:05

hbushouse approved these changes Jul 11, 2023

View reviewed changes

stscijgbot-jp mentioned this pull request Jul 13, 2023

NIRSpec F-flat reference files are missing an error extension spacetelescope/jwst#7720

Closed

hbushouse added jwst datamodels nirspec labels Jul 31, 2023

braingram force-pushed the jp-3304-nrs-fflat-var branch from 4c51bf6 to 68c80fc Compare July 31, 2023 18:26

braingram requested a review from hbushouse July 31, 2023 18:51

tapastro and others added 6 commits September 8, 2023 10:37

add error to nirspec flat flat_table

3db5799

changelog

395ab38

add change to quad_flat as well

220312c

add migration and test for nirspec flat models

46014d4

make migration generic and work with DataModel.__init__

4c2866f

add migration and test for nirspec flat models

248d07d

tapastro force-pushed the jp-3304-nrs-fflat-var branch from 6847301 to 248d07d Compare September 8, 2023 14:39

tapastro added 3 commits September 8, 2023 10:43

bad rebase choices

24e47df

remove unused import

45e1f37

illegal format - new table from coldefs req

592c04a

braingram reviewed Sep 8, 2023

View reviewed changes

src/stdatamodels/jwst/datamodels/nirspec_flat.py Outdated Show resolved Hide resolved

src/stdatamodels/jwst/datamodels/nirspec_flat.py Show resolved Hide resolved

src/stdatamodels/jwst/datamodels/nirspec_flat.py Outdated Show resolved Hide resolved

check for HDUList before migration

39660c0

braingram reviewed Sep 8, 2023

View reviewed changes

src/stdatamodels/jwst/datamodels/nirspec_flat.py Outdated Show resolved Hide resolved

tapastro and others added 3 commits September 8, 2023 13:31

fix dtype listed shape

d96e7be

expand test_nirspec_flat_table_migration to columns with shapes

eef6b31

Merge pull request #3 from braingram/bjg_jp_3304_take_2

7de93d0

expand test_nirspec_flat_table_migration to columns with shapes

braingram approved these changes Sep 8, 2023

View reviewed changes

Merge branch 'master' into jp-3304-nrs-fflat-var

0f80863

hbushouse approved these changes Sep 11, 2023

View reviewed changes

hbushouse merged commit 6d155a0 into spacetelescope:master Sep 11, 2023
16 of 17 checks passed

braingram mentioned this pull request Sep 25, 2023

JP-3304: use error table from f-flat reference files spacetelescope/jwst#7972

Closed

7 tasks

braingram mentioned this pull request Sep 26, 2024

JP-2349: Add workaround to load old files with new moving_target_table schema #329

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JP-3304: Add error to NIRSpec flat_table #183

JP-3304: Add error to NIRSpec flat_table #183

tapastro commented Jul 10, 2023 •

edited by hbushouse

Loading

codecov bot commented Jul 10, 2023 •

edited

Loading

hbushouse commented Jul 11, 2023

hbushouse commented Jul 11, 2023

hbushouse left a comment

braingram commented Jul 18, 2023

braingram commented Jul 18, 2023

hbushouse commented Jul 31, 2023

braingram commented Jul 31, 2023

braingram commented Jul 31, 2023 •

edited

Loading

tapastro commented Aug 1, 2023

braingram commented Aug 1, 2023

braingram commented Aug 23, 2023

hbushouse commented Aug 24, 2023

braingram commented Aug 24, 2023

braingram left a comment

tapastro commented Sep 8, 2023

braingram left a comment •

edited

Loading

hbushouse commented Sep 11, 2023

hbushouse commented Sep 11, 2023

braingram commented Sep 11, 2023

hbushouse commented Sep 11, 2023

JP-3304: Add error to NIRSpec flat_table #183

JP-3304: Add error to NIRSpec flat_table #183

Conversation

tapastro commented Jul 10, 2023 • edited by hbushouse Loading

codecov bot commented Jul 10, 2023 • edited Loading

Codecov Report

hbushouse commented Jul 11, 2023

hbushouse commented Jul 11, 2023

hbushouse left a comment

Choose a reason for hiding this comment

braingram commented Jul 18, 2023

braingram commented Jul 18, 2023

hbushouse commented Jul 31, 2023

braingram commented Jul 31, 2023

braingram commented Jul 31, 2023 • edited Loading

tapastro commented Aug 1, 2023

braingram commented Aug 1, 2023

braingram commented Aug 23, 2023

hbushouse commented Aug 24, 2023

braingram commented Aug 24, 2023

braingram left a comment

Choose a reason for hiding this comment

tapastro commented Sep 8, 2023

braingram left a comment • edited Loading

Choose a reason for hiding this comment

hbushouse commented Sep 11, 2023

hbushouse commented Sep 11, 2023

braingram commented Sep 11, 2023

hbushouse commented Sep 11, 2023

tapastro commented Jul 10, 2023 •

edited by hbushouse

Loading

codecov bot commented Jul 10, 2023 •

edited

Loading

braingram commented Jul 31, 2023 •

edited

Loading

braingram left a comment •

edited

Loading