-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JP-3304: Add error to NIRSpec flat_table #183
JP-3304: Add error to NIRSpec flat_table #183
Conversation
Codecov ReportPatch coverage is
📢 Thoughts on this report? Let us know!. |
Old NIRSpec flat ref files may no longer pass validation once this is merged, but will they at least still be loadable into a data model? Or will it crash due to the missing table column? |
P.S. The "shape" definitions were there because years ago the models could not support table columns with variable length/size. So they were all hardwired to a (ridiculously) large value to cover all possibilities. Yes, this led to a lot of zero padding in the ref files themselves. The models are smarter and more flexible now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look fine, but I agree that we should hold off on merging this until the new flats are actually available. And I don't want to merge it into the final B9.3 release, in case it breaks something.
Following up on offline conversation with @tapastro I'm not sure if there is a way to do this that preserves backwards compatibility of the schema. Attempts to use 'oneOf' and other schema combiners failed because these combiners do not appear to be fully supported during the fits/schema loading/saving in fits_support. I'm not seeing an easy way to add support for this (I hope I'm overlooking something). Versioning the datamodel and schema seems like one option but I didn't immediately see another example of this in this package. Is there an example/precedent for how to handle this? It should be possible to open the old files by setting cast_fits_arrays to |
It occurred to me that there might be hooks in What about the following? Before the call to stdatamodels/src/stdatamodels/jwst/datamodels/nirspec_flat.py Lines 35 to 36 in fbd97cc
the fits table for the old files could be updated, filling it with the missing error column. Something like the following
if 'FAST_VARIATION' in init:
# check that table has the required columns
# for older files they might be missing an 'err' column
table_data = init['FAST_VARIATION'].data
if ('error', '>f4') not in table_data.dtype.descr:
err = numpy.empty(table_data.shape[0], dtype=[('error', '>f4')])
err[:] = numpy.nan
table_data = merge_arrays((table_data, err), flatten=True)
init['FAST_VARIATION'].data = table_data |
@braingram This looks reasonable to me. Can you make it work? I guess that update could just be added to this PR. |
I'll give it a whirl. |
4c51bf6
to
68c80fc
Compare
I pushed a commit to the source branch for this PR (@tapastro let me know if you'd rather I open PRs against your branch in the future). The commit migrates old files containing NirspecFlat and NirspecQuadFlat models by adding error columns that contain all For example, using the changes in this PR, this file from CRDS can be opened: In [4]: m.quadrants[0].flat_table
Out[4]:
FITS_rec([('ANY', 2419, [0.7 , 0.70023566, 0.7004714 , ..., 0. , 0. , 0. ], [3.9957927e+20, 3.9931036e+20, 3.9904166e+20, ..., 0.0000000e+00, 0.0000000e+00, 0.0000000e+00], nan)],
dtype=(numpy.record, [('slit_name', 'S15'), ('nelem', '>i4'), ('wavelength', '>f4', (130000,)), ('data', '>f4', (130000,)), ('error', '>f4')])) Note the 'nan' in the error column and the shape of the error column ('error', '>f4') compared with the shape of for example wavelength ('wavelength', '>f4', (130000,)). Also note that there appears to be a bug in the NirspecQuadFlatModel constructor when provided with an instance of NirspecFlatModel: #186 |
Hi Brett, I would push to remove the fixed-length sizes (as in this PR code) - they're way overboard in size and unnecessary. Looking at the output, it looks as though the error column is not of the same shape as the other columns - am I misreading that? It looks like it has a single entry rather than an array. |
The error column is not the same shape as the other columns. Should it be? For the CRDS file mentioned above, the other columns ('wavelength', etc) are the large fixed-length sizes. Should the error column be constructed with the same size? It looks like there is a 'nelem' column that specifies the 'valid' values (at least looking at 'wavelength', all the values after the first 'nelem' are 0s). Presumably removing the fixed-length sizes would allow newer files to generate columns with only 'nelem' elements? Thanks for fielding the extra questions. Updating the code to change the error column size shouldn't be an issue once we've decided on the size (and preferred value). |
As @hbushouse mentioned in JP-3304 the changes in this PR could be revisited using the new @tapastro would you rather this PR be updated to used the |
The "shape" specifications are no longer relevant and could be removed for all column definitions. |
Thanks @hbushouse. To make sure I understand. The 'shape' property from this schema should be removed (the large numbers added at some earlier time). For example this one should be removed:
For the changes I added to this PR (in commit: 68c80fc) the Should the added |
6847301
to
248d07d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the CI errors it looks like the changes in my PR against your branch failed to check that init
was a HDUList
before attempting to migrate the table. I think the changes suggested here should fix the TypeError: 'NoneType' object is not iterable
errors (as tested locally).
@braingram Thanks for checking on the test fix! I'm wrapped up in trying to wrangle the array merge - the desired behavior is for the error column to match the shape of the wavelength and flux columns. As I have it in the PR currently, I get an error implying that a modification of the HDUList's table is not allowed due to a mismatch in ColDefs, while if I return the Error 1: Error 2: |
expand test_nirspec_flat_table_migration to columns with shapes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
jwst failures look like the expected ones described here: spacetelescope/jwst#7874
CI failure is unrelated. |
Full regtest run, pointing to this PR branch, started at https://plwishmaster.stsci.edu:8081/job/RT/job/JWST-Developers-Pull-Requests/914 |
I see 6 failures in
are these related to updated truth files for spacetelescope/jwst#7879 or a result of this PR? |
Yes, the regtest failures are unrelated and due to having already uploaded some modified inputs for a change in jwst. Looks good. |
Resolves JP-3304
This PR addresses a missing schema table column needed for the NIRSpec team to define the flat variance arrays. I also removed some strange flat_table shape definitions, which were initializing arrays of size 130000, for which in an example I viewed I saw all but the first 1446 entries were zero.
Note that if this is merged, no existing reference files will pass validation - I don't think this can be avoided, unless a schema guru can figure it out for me. This probably should stay out of a release until NIRSpec has delivered updated flats.
Checklist
CHANGES.rst
(either inBug Fixes
orChanges to API
)