Skip to content

Update LSST schemas for v7#70

Merged
hernandezc1 merged 21 commits intomainfrom
u/ch/lsst/addschemas
Mar 6, 2025
Merged

Update LSST schemas for v7#70
hernandezc1 merged 21 commits intomainfrom
u/ch/lsst/addschemas

Conversation

@hernandezc1
Copy link
Copy Markdown
Contributor

@hernandezc1 hernandezc1 commented Feb 3, 2025

LSST schema versions 7.2, 7.3 and 7.4 are now available. This PR creates subdirectories containing schema information to pittgoogle/schemas/lsst/7 and updates schemas.py.

I believe these changes cover all of the bases outlined in add a new schema to the registry.

@hernandezc1 hernandezc1 self-assigned this Feb 3, 2025
@hernandezc1 hernandezc1 requested a review from troyraen February 3, 2025 17:34
Copy link
Copy Markdown
Contributor

@troyraen troyraen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!! Test this to make sure things work. I think it will need some minor changes to give access to all schema versions. Otherwise, LGTM. Thanks for doing it.

Comment thread pittgoogle/registry_manifests/schemas.yml
Comment thread pittgoogle/schema.py Outdated
Comment thread pittgoogle/schemas/lsst/7/2/sample_data/generate.py Outdated
@troyraen
Copy link
Copy Markdown
Contributor

Oh, please also update CHANGELOG.md.

@hernandezc1 hernandezc1 changed the title Add LSST schema v_2 and v_3 Update LSST schemas for v7 Feb 24, 2025
@troyraen
Copy link
Copy Markdown
Contributor

Consider merging lsst_auto_schema_helper and lsst_schema_helper, as described in my final comment here: mwvgroup/Pitt-Google-Broker#250 (review). (Not a must for this PR.)

Copy link
Copy Markdown
Contributor

@troyraen troyraen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Three requests for you @hernandezc1:

  • Changelog -- see below.
  • If you haven't yet, check the schema map at 'pittgoogle/schemas/maps/lsst.yml' and make sure it still applies to all schemas. Also, check that it actually gets used, e.g., when you load a `pittgoogle.Alert' with different versions -- I think it will be but don't remember exactly how that is configured.
  • Let's move everything that's in the 'sample_data' directories outside of the 'pittgoogle' directory so that it doesn't end up in the package that we publish to PyPI. (At least, I'm pretty sure that will mean it doesn't end up the package... we should double check.) See #71. Having to keep copies of all these schema files is already taking more space than I'd like -- we might have to reconsider how we're handling those if we have to support many more versions.
    • Let's put the sample data at 'tests/data' (so make a new 'data' directory there) since we'll need it for unit tests anyway. For now just use your best judgement for how to organize it. Consider whether we actually need the 'alert.json' and 'generate.py' files or if the info in the README is sufficient for our future needs. These last bits are not crucial for now -- minimum for this PR I think is to move them out of the 'pittgoogle' dir.

Re: my suggestion above to combine the schema helpers. This is likely to require more thought than I realized at the time. Opened #72.

My to-do list:

  • Fix the GitHub Actions failures.

The issue when building docs is coming from a failure to build dependencies. Our use of a poetry.lock file is supposed to avoid this, so not sure what's going on there. The other failures may have the same cause but I can't tell yet. Unfortunately, I don't have any more time to look at this today so I'll need to come back to it this weekend. I know you were hoping to merge this tomorrow @hernandezc1, so my apologies about that. If I don't have a solution by Monday we can talk about whether there's a way for you to still move this forward in the meantime. At a minimum, we'd need to make sure that the dependency issues are isolated to the docs and wouldn't result in the package being unusable when we publish a new version to PyPI.

Comment thread CHANGELOG.md Outdated
Comment on lines +15 to +26
## \[v0.3.12\] - 2025-02-27

### Added

- Support for LSST schema versions 7.2, 7.3, and 7.4
- README.md describing where the new LSST alert schemas were obtained.

### Changed

- `schema.py` now specifies all schema versions that are available for LSST
- Schema mappings for new LSST alert versions incorporated into `schemas.yml`

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put these in the "Unreleased" section that is right above this rather than adding a new section for v0.3.12 just yet. When you get to the point where you are ready to merge this, if you are sure that these will be the only new changes since v0.3.11 (i.e., my pr #67 isn't merged yet) and you intend to actually release v0.3.12 right after you merge, then come back here and add the new section for v0.3.12 as the last thing you do before merging.

More typically, my workflow is to do one tiny PR like #47 right before a release that moves everything in the "Unreleased" section to a new section with the version name.

@troyraen
Copy link
Copy Markdown
Contributor

troyraen commented Mar 3, 2025

@hernandezc1 I fixed the GitHub Actions checks in #73 and added some tests in #74. Rebase this onto main and see if that fixes these checks.

Re: moving the sample_data. I added the dir 'tests/data' and moved some LSST and ZTF alerts in to it. Follow that directory structure and file naming when moving your LSST data into it. When I moved the lsst v7.1 alert and tried to use it for testing I found that the sample alert had the schema attached in the header, so I had to alter it to make it match the real alerts that we expect so that it would work in the tests. If your sample alerts also have the schema attached they will (hopefully) cause some of the new unit tests to fail. Try this to get them into the proper Confluent Wire Format:

import pittgoogle

# Replace both paths that are between angle brackets with appropriate paths.
alert = pittgoogle.Alert.from_path("<path-to-alert-with-schema-attached>")
with open("<path-to-alert-in-confluent-wire-format>", 'wb') as fout:
    fout.write(alert.schema.serialize(alert.dict))

@hernandezc1
Copy link
Copy Markdown
Contributor Author

When I moved the lsst v7.1 alert and tried to use it for testing I found that the sample alert had the schema attached in the header, so I had to alter it to make it match the real alerts that we expect so that it would work in the tests. If your sample alerts also have the schema attached they will (hopefully) cause some of the new unit tests to fail. Try this to get them into the proper Confluent Wire Format:

@troyraen Was the sample alert for v7_1 obtained from the lsst/alert_packet repo (i.e., 'fakeAlert.avro')? I'm a bit confused at the moment.

I'm using the code you provided on the fakeAlert.avro files for each of the new schema versions (I see that the schema is attached in the header). I use:

# define paths
path_alert_v7_3_schema_attached = "/Users/.../pittgoogle-client/tests/data/sample_alerts/lsst/7/3/fakeAlert.avro"
path_v7_3_confluent_wire_format = "/Users/.../pittgoogle-client/tests/data/sample_alerts/lsst/7/3/lsst.v7_3.avro"

# create alert object
alert_v7_3 = pittgoogle.Alert.from_path(path_alert_v7_3_schema_attached)
with open(path_v7_3_confluent_wire_format, 'wb') as fout:
    fout.write(alert_v7_3.schema.serialize(alert_v7_3.dict))

When I try to load an alert using the output file I see that alert's message is not serialized, unlike when I use the lsst.v7_1.avro file you added to the pittgoogle-client repo (tests/data/sample_alerts/lsst/lsst.v7_1.avro).

As you can see below, the msg does not get serialized:

new_alert_v7_3 = pittgoogle.Alert.from_path(path_v7_3_confluent_wire_format)
new_alert_v7_3
Screenshot 2025-03-03 at 1 47 39 PM

Compare it to 'lsst.v7_1.avro':

alert_v7_1 = pittgoogle.Alert.from_path("/Users/.../pittgoogle-client/tests/data/sample_alerts/lsst/lsst.v7_1.avro", "lsst.v7_1.alert")
alert_v7_1
Screenshot 2025-03-03 at 2 06 31 PM

@troyraen
Copy link
Copy Markdown
Contributor

troyraen commented Mar 3, 2025

@hernandezc1 Sorry on both fronts for my incomplete instructions.

Yes, the input file I used was the 'fakeAlert.avro' file. I also see that I left a step out of that code... You need a Schema object with the correct serializer....

alert_v7_3 = pittgoogle.Alert.from_path(path_alert_v7_3_schema_attached)
# Here's one way to get a Schema with the correct serializer:
_schema = pittgoogle.Alert.from_dict(alert_v7_3.dict, schema_name='lsst.v7_3.alert').schema
with open(path_v7_3_confluent_wire_format, 'wb') as fout:
    fout.write(_schema.serialize(alert_v7_3.dict))

@hernandezc1
Copy link
Copy Markdown
Contributor Author

@hernandezc1 Sorry on both fronts for my incomplete instructions.

Yes, the input file I used was the 'fakeAlert.avro' file. I also see that I left a step out of that code... You need a Schema object with the correct serializer....

alert_v7_3 = pittgoogle.Alert.from_path(path_alert_v7_3_schema_attached)
# Here's one way to get a Schema with the correct serializer:
_schema = pittgoogle.Alert.from_dict(alert_v7_3.dict, schema_name='lsst.v7_3.alert').schema
with open(path_v7_3_confluent_wire_format, 'wb') as fout:
    fout.write(_schema.serialize(alert_v7_3.dict))

this was really helpful, thanks!

@hernandezc1
Copy link
Copy Markdown
Contributor Author

@troyraen All changes have been tested. I decided to remove 'alert.json', 'fakeAlert.avro', and 'generate.py' altogether; I don't think they're really useful to have in this repo (what do you think?)

@hernandezc1 hernandezc1 requested a review from troyraen March 3, 2025 20:24
Copy link
Copy Markdown
Contributor

@troyraen troyraen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the alerts was malformed somehow so I pushed a fix to your branch. Be sure to pull before trying to commit or push again.

Two requests about files below. Otherwise looks good. Thanks!

When merging in this repo, use "Squash and merge" option. It keeps the git history cleaner and easier to understand. You can leave the default text for the title and description.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this file to tests/data/sample_alerts/lsst/.

Comment thread pittgoogle/schemas/.DS_Store Outdated
@troyraen troyraen added the enhancement New feature or request label Mar 4, 2025
@hernandezc1
Copy link
Copy Markdown
Contributor Author

@troyraen thanks for your feedback and helping fix the issue! I pulled your changes and just finished pushing mine. I see that the tests are failing and I'm not sure how to fix them, do you mind taking a look when you get the chance?

@troyraen
Copy link
Copy Markdown
Contributor

troyraen commented Mar 4, 2025

Oh, the tests are probably trying load the readme thinking it's an alert. Fixing it now...

@troyraen
Copy link
Copy Markdown
Contributor

troyraen commented Mar 4, 2025

@hernandezc1 This looks good to go.

Two notes:

  • I updated Codacy settings over the weekend. I think it's better, but still runs into an issue sometimes where it hangs and never reports back. I'll have to look at it more later. For now, when it hangs for >~5 min I just go to the Codacy dashboard (https://app.codacy.com/gh/mwvgroup/pittgoogle-client/), find the pull request, and click "reanalyze".
  • For now I'm just trying to get the unit tests going in a way that works and I understand. Seems like it's going ok. I'll put up a docs page soon and we can talk more about how you can start interacting with them.

@troyraen troyraen mentioned this pull request Mar 5, 2025
@hernandezc1 hernandezc1 merged commit 97752f8 into main Mar 6, 2025
@hernandezc1 hernandezc1 deleted the u/ch/lsst/addschemas branch March 6, 2025 21:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants