Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor STAC extensions and validation outside of populator logic #38

Merged
merged 6 commits into from
Jan 9, 2024

Conversation

fmigneault
Copy link
Collaborator

Changes

  • Remove the custom reimplementations of STAC Asset, Item, Properties, etc. with pydantic.
    Use all original pystac objects instead to let them deal with their own validation strategy.
  • Remove the need for populators to each reimplement their own STAC Item schema.
    They can instead combine as many STAC extension objects as they want/need, and dispatch the resolution/validation to respective extensions.
  • Transfer the CMIP6 and THREDDS logic into their respective extensions following pystac approach.
    • Add CMIP6Helper and THREDDSHelper (similar to DatacubeHelper) that are used to perform NetCDF/NCML conversion as necessary.
    • Avoids mixed use of a single class for converting attributes and validating at the same time, making logic harder to reuse in cases where THREDDS DataLoader is not involved.
  • Fix missing/invalid extension JSON schema ref in stac_extensions for CMIP6.

Rebased onto #33
Could use #35 as well to fix test that patches access_urls (copied over ServiceType in the meantime).
Closes #32

@fmigneault fmigneault self-assigned this Nov 18, 2023
@fmigneault fmigneault changed the title Fix bugs in datacube extension. Add test. Add schema for local valida… refactor STAC extensions and validation outside of populator logic Nov 18, 2023
Copy link
Collaborator

@huard huard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the clear structure of the extensions.
I suspect there are are few bits of the new stuff that are not exercised by the test suite. In particular, I'll be curious to see the summaries in action. That could be another PR, just flagging it so we don't forget it.

@dchandan
Copy link
Collaborator

The cmip6 extension effort is impressive and does have value in its own way. The new features regarding summaries can be retained. However, this PR makes rather substantial changes to the methodology for creating STAC items and in that regard it moves far from the original framework that Mathieu and I had come up with where our goal was to focus on making the system friendly and approachable to end-users and specifically avoid the traps that other similar projects have found them into where usability takes a back-seat to the desires of developers to do fancy stuff. I continue to believe in the need to approach stac-populator from a usability perspective and in the need to keep things simple. The complexity of the CMIP6 extension violates those beliefs; sure it makes add_cmip6.py look simple, but a lot of complexity has been added to the logic for the extension, not to mention that there is also a separate THREDDS extension. For this reason I am rejecting this PR.

I have long maintained my position that I would like to use this implementation, which is for UofT CMIP6 data, as an example for how other users can create representations for any type of data without needing to create extensions or using fancy pystac facilities, all of which increase the level of understanding one has to have before creating STAC representations and could constitute a barrier. Francis knows this very well, so submission of this PR without prior discussions on changes of this sort seems like a heavy-handed effort to push through changes that he is personally in favour. I also think it was implicit, following Mathieu's departure, that UofT is taking the lead on this project going forward (especially with CRIM winding down), so I find Francis's efforts to steer the project in a different direction without sufficient consultation and without deference to the plans and visions of those who started the project, to be rather bold to put it mildly. I don't think this is the way to do things.

@dchandan dchandan closed this Nov 21, 2023
@fmigneault
Copy link
Collaborator Author

@huard

I suspect there are are few bits of the new stuff that are not exercised by the test suite. In particular, I'll be curious to see the summaries in action. That could be another PR, just flagging it so we don't forget it.

Indeed.
I did not test it further because they were not currently applied explicitly by the populator. If added at some point, that would have to be validated.

@dchandan
I completely disagree with your statement:

PR makes rather substantial changes to the methodology for creating STAC items and in that regard it moves far from the original framework that Mathieu and I had come up with and I had come up with where our goal was to focus on making the system friendly and approachable to end-users

The PR does not change at all how the populator procedure behaves. It only refactors the STAC CMIP6 extension interface to align it with all other existing STAC extensions. This will allow to eventually port it to the official https://github.com/stac-utils/pystac repo after further validation. This will greatly simplify its use by external users and makes it more approachable.

I don't see how this could be made easier for users... You can now do:

for extension in [ CMIP6Extension, THREDDSExtension, DatacubeExtension, <etc any extension> ]: 
    extension.apply(item)

Instead of going through all the loops for pydantic.STACItem <-> pystac.Item conversions.
It also reduces chances to make errors, since they don't have to reimplement the specification themselves.

I would like to use this implementation, which is for UofT CMIP6 data

I think you had a narrowed view based on your use case alone.
I have tried using the current implementation of CMIP6Extension without all the intertwined logic from STACpopulator and THREDDS-specific details. They were too coupled together to make it applicable in any other case than yours. The current refactor proposal does not break your populator implementation. All provided tests produce the exact same results (actually, it fixes it because the stac_extension was missing the CMIP6 reference...).

how other users can create representations for any type of data without needing to create extensions or using fancy pystac facilities,

There is no need for them to create anything if the extension implementations are actually made available in official pystac...

PR without prior discussions on changes [...]
without sufficient consultation and without deference to the plans and visions of those who started the project

This is the point of this PR... to have a discussion around it.
How do you want me to demonstrate it without some code at hand to illustrate that it can work?

especially with CRIM winding down

You need to understand that DACCS project is ending on our side because we exhausted our budget. That does not mean we are disappearing from the map regarding geospatial projects. We have many other projects actively using STAC and related technologies. I am already forwarding the use of the populator to ESA and other companies. You have to think about the wider scope than DACCS / Marble only.

@fmigneault fmigneault reopened this Nov 21, 2023
@dchandan
Copy link
Collaborator

I am not opposed to this CMIP6 extension for what it is. Because CMIP6 is widely used, incorporating this extension in pystac would bring genuine benefit to the geospatial community. I am opposed to it being used in place of what I had because I had a very specific goal in mind: that to set an example for how to create STAC items for any data without the need for extensions.

The PR does not change at all how the populator procedure behaves. It only refactors the STAC CMIP6 extension interface to align it with all other existing STAC extensions. This will allow to eventually port it to the official https://github.com/stac-utils/pystac repo after further validation. This will greatly simplify its use by external users and makes it more approachable.

You're right, it doesn't strictly change how the populator behaves. But it does affect the vision of providing a tool that is easy to use and which provides a low barrier for users to adapt and extend.

how other users can create representations for any type of data without needing to create extensions or using fancy pystac facilities,

There is no need for them to create anything if the extension implementations are actually made available in official pystac...

You are completely missing the point here. I am not talking about someone who needs to create another CMIP6 implementation, I am talking about laying out an example or examples for future node operators to use when they want to put some other data on their node and catalog it in their STAC catalogs. I wanted to lay out a method for them to be able to create STAC objects for their data without having to explicitly create extensions.

An extension only becomes useful when it is used for very commonly used data like CMIP6 (so, like I said above, in that regard your extension makes perfect sense), but there are thousands upon thousands of datasets out there, big and small, frequently used and infrequently used, and I am reluctant to suggest to the community, or set an example, that for any of those datasets "go ahead and write an extension; that is the only way to create a STAC representation".

You have gone through all this effort to create CMIP6 implementation, but its usefulness is limited to those who will need to put CMIP6 data on their own machines. How many other Marble nodes will do that? Why would they even do that when UofT node already has the data? So, in this regard your effort does not help the Marble community, I can only see it being of use to non-Marble users. The more generic first-principles based approach that I was using would have been useful to other Marble users.

@dchandan
Copy link
Collaborator

Instead of going through all the loops for pydantic.STACItem <-> pystac.Item conversions. It also reduces chances to make errors, since they don't have to reimplement the specification themselves.

There were no "loops" for converting from pydantic.STACItem to pystac.Item. There was only one step to go from a pydantic.STACItem object to pystac.Item which was also done only once in the course of creating a STAC item. Absolutely no simplification with regards to this is achieved in your implementation, you are still converting pydantic models to pystac structures, you are just doing it through extensions and in the course of that adding considerably more code. Before, one had:

item = STACItem(.....)  # a pydantic object. One line call to create a full pydantic object
# Convert pydantic STAC item to a PySTAC Item
item = pystac.Item(**json.loads(item.model_dump_json(by_alias=True)))

Done. Now item is a pystac.Item and all subsequent manipulation is done on the pystac.Item object. There are no "loops". This is simple and more importantly readable.

You have taken all the code from STAC_item_from_metadata and shuffled it all over the place and put it behind layers of classes and you are claiming that is simpler. How is it preferable? Your implementation is certainly much much less readable, which is not great thing for this project because we want people to be able to look at sample implementations and write their own implementations, and we want to lower the barrier for them.

@dchandan
Copy link
Collaborator

This is the point of this PR... to have a discussion around it. How do you want me to demonstrate it without some code at hand to illustrate that it can work?

You should have opened an issue saying you are thinking of making this big shift, and inquiring if there enough support to do that.

@dchandan
Copy link
Collaborator

I think you had a narrowed view based on your use case alone. I have tried using the current implementation of CMIP6Extension without all the intertwined logic from STACpopulator and THREDDS-specific details. They were too coupled together to make it applicable in any other case than yours.

I agree that some refactoring was needed, some logic ended up too coupled. I had a plan to decouple that and this de-coupling could be easily done without the complexity you have introduced with the two new extensions. I also should say here that the THREDDSextension is totally unnecessary. So much code for doing so little. How is it better than before?

The current refactor proposal does not break your populator implementation.

I don't know yet, because the PR doesn't work right now, so I'll have to wait until the issues are fixed to assess this statement.

@dchandan
Copy link
Collaborator

You need to understand that DACCS project is ending on our side because we exhausted our budget. That does not mean we are disappearing from the map regarding geospatial projects. We have many other projects actively using STAC and related technologies. I am already forwarding the use of the populator to ESA and other companies. You have to think about the wider scope than DACCS / Marble only.

This is exactly the issue that has plagued birdhouse and its various components and it causing a problem here: You have conflicting priorities. I want to act in the best interest of future Marble node hosts, but you have other projects in mind for which the preferred design can conflict with the design that is in the interest of Marble node users/hosts. I fully understand that you are not disappearing from map of geospatial projects, neither do I want you to, but you also need to understand that the work that was being done here was for DACCS project and the priorities of the project should be first and foremost. No one is stopping you from going your own way on a different project.

@fmigneault
Copy link
Collaborator Author

This is exactly the issue that has plagued birdhouse and its various components and it causing a problem here: You have conflicting priorities.

I don't see the plaguing issue that you observe, and am very disappointed you feel this way about a many-year collaborative effort.
Without all this prior work from DKRZ followed by PAVICS and many OGC projects, you would not even have this infrastructure to work with. The whole point of making an open-source research platform is to allow it to evolve and improve it over time, not restart from scratch each time to hit the same problems over and over again. Those are not conflicting priorities, it is about aligning shared objectives.

I'm not sure how you plan on promoting Marble to new users and node hosts, or get future financing once DACCS budget is exhausted on your side, if the server cannot even be interoperable with others that are based on the same birdhouse, let alone other implementations. You should see this as an opportunity to validate that the implementation is robust for many use cases.

I want to act in the best interest of future Marble node hosts, but you have other projects in mind for which the preferred design can conflict with the design that is in the interest of Marble node users/hosts.

Like previously mentioned, I am proposing an alignment with the pystac reference implementation to define STAC objects. That will eventually allow exposure of this utility and its capabilities to the full STAC community, which has much more traction than all of DACCS members combined. If you cannot see this, as well as the work of "other projects" directly developing and reusing those STAC extensions, as an opportunity to obtain new hosts for Marble... I don't know what to tell you.

The "preferred design" seems to be only your own. The community that worked on pystac seems to believe otherwise. The new Marble hosts will have to interact with pystac. They will have to interact with the STAC community. It seems fair IMO to align with the bigger player that leads the STAC initiative than work against them.

@fmigneault
Copy link
Collaborator Author

fmigneault commented Nov 22, 2023

laying out an example or examples for future node operators to use when they want to put some other data on their node and catalog it in their STAC catalogs. I wanted to lay out a method for them to be able to create STAC objects for their data without having to explicitly create extensions.

If extensions or core STAC properties are not used, then there is no point in publishing that data using STAC.
No other software will have any reference schema regarding how to parse that data and use it.
They might as well push unnormalized text/plain content, and the result would be the same.
The normalization using STAC and its extensions is what gives all its power to this metadata and cataloguing capabilities (we did discuss that before and why it is important).
Guiding the user into avoiding extensions is an antipattern and bad practice overall.

item = pystac.Item(**json.loads(item.model_dump_json(by_alias=True)))

Done. Now item is a pystac.Item and all subsequent manipulation is done on the pystac.Item object.

Wrong. With this, you lose all references to the STAC extensions you supposedly applied just before, and it gives you the false impression that your item data is valid, while it isn't. Point in hand, the results from the test from the current implementation did not generate valid STAC definitions, which I had to patch the missing stac_extensions reference: https://github.com/crim-ca/stac-populator/pull/38/files#diff-bad03ebcc2f6bef2d845f67ba6b5dbf05249d1c59ead566fabaed08862ae6735

Using pystac extensions directly, that was done automatically, and this is how I could notice this problem.
Users do not even need to worry about potentially making data manipulation errors when using official extensions.

All capabilities, methods, and data-handling from pystac and applied extensions would also be lost.
In the current CMIP6 case, there aren't many extra features provided. But some other extensions do have special handling logic when combined. The user that could otherwise take advantage of all pystac utilities now has to reimplement them himself, increasing chances of making errors since they would not know the full STAC spec by heart.

By not reusing pystac, you dropped features without even realizing it.
You also duplicated the core logic, such as the datetime vs start_datetime/end_datetime handling.
Are you planing to ask for new users to do that for every new data/property they will have to populate that we did not already provide, instead of reusing available tools?

You also seem to forget that the purpose of pystac is not only to generate data, but to validate and interact with it once you have retrieved it from the API. How are the users supposed to make sure the content is valid on their side and put it to use once they queried it from STAC API? Since everything was combined into a single STAC_item_from_metadata, they would have to piece it out anyway and parse everything again for their own use, just to extract the CMIP6 specific part. We shouldn't force a user that wants to use STAC with CMIP6 annotations to reimplemented its logic without mixing in THREDDS and populator logic, as it did previously. The pystac approach allows the user to apply gradually more extensions on top of the core Item as needed, without impacting other components.

You should have opened an issue saying you are thinking of making this big shift, and inquiring if there enough support to do that.

It is not the first time that using pystac extension approach was proposed.
When mentioned previously, it seemed too hard to visualize how that could be accomplished, because it required a big refactor, as this PR demonstrates. Instead of going back and forth over many hypothetical discussions and iterations that would last weeks (as did the many https://github.com/crim-ca/stac-populator/tree/arch-changes and derived branches), this presents an actually working implementation right away. I'm not saying it as to be merged directly as is, but we at least have tangible code to work with.

I also should say here that the THREDDSextension is totally unnecessary.

I disagree.
I have put forward many times the need to dissociate THREDDS and siphon crawling logic from the STAC generation step for cases that directly obtain the NCML:

All this has been disregarded each time, and the simple hooks that were asked to achieve my goal are still not added to this date, although I provided explicit ways on how to implement them. My understanding is that unless I add them myself, this would never be done.

The current refactor proposal does not break your populator implementation.

I don't know yet, because the PR doesn't work right now, so I'll have to wait until the issues are fixed to assess this statement.

Not sure what you mean? All tests are passing since this morning.

@dchandan
Copy link
Collaborator

Like previously mentioned, I am proposing an alignment with the pystac reference implementation to define STAC objects. That will eventually allow exposure of this utility and its capabilities to the full STAC community, which has much more traction than all of DACCS members combined. If you cannot see this, as well as the work of "other projects" directly developing and reusing those STAC extensions, as an opportunity to obtain new hosts for Marble... I don't know what to tell you.

The "preferred design" seems to be only your own. The community that worked on pystac seems to believe otherwise. The new Marble hosts will have to interact with pystac. They will have to interact with the STAC community. It seems fair IMO to align with the bigger player that leads the STAC initiative than work against them.

There seems to a misunderstanding on your side that I am proposing to move away from pystac or STAC standards and what not, which I clearly am not. I are evidently using STAC and I am evidently using pystac STAC objects and in the process of creating STAC representations. That is pretty strong alignment I would say. The Marble hosts would not have any issues interacting with other STAC servers as long as the STAC implementation is correct, which it is. (if you think something is not conformant to the STAC standards, then point it out and lets focus on that first rather than on fancy refactoring).

@dchandan
Copy link
Collaborator

Your second comment above contains several errors and frequently falls into familiar misunderstandings. I won't go into them all point-by-point as it is simply not worth it, so I will focus on just a few issues:

If extensions or core STAC properties are not used, then there is no point in publishing that data using STAC.

That is a pretty thin argument and entirely flawed. Furthermore, it is irrelevant, because I am using core STAC properties and I am using extensions, more on this below.

No other software will have any reference schema regarding how to parse that data and use it.
They might as well push unnormalized text/plain content, and the result would be the same.
The normalization using STAC and its extensions is what gives all its power to this metadata and cataloguing capabilities > (we did discuss that before and why it is important).

I think this is another misunderstanding that keeps cropping up. I am not opposed to using STAC extensions, by which I mean the extensions defined in JSON Schemas. I know they are important, in fact in add_CMIP6.py I specifically add Tom's CMIP6 proposed extension:

item.stac_extensions.append(
            "https://raw.githubusercontent.com/TomAugspurger/cmip6/main/json-schema/schema.json"
        )

We are also using Datacube extension. I am also not opposed to the pystac implementation of the CMIP6 extension, as I have noted earlier, it can be of use outside of Marble. My opposition in this PR, and outside of the PR, has been limited to how those extensions are represented in python and pystac. I have said above that the complexity of a pystac extension should be entertained only when the benefit outweighs that complexity. You also seem to be under a misunderstanding that STAC extensions have to be implemented in code using pystac extensions, this is obviously not true. Right now there are ~70 STAC extensions in various stages of maturity on the official STAC extension site (and probably several more unofficial ones, including Tom's CMIP6), while only 20 of these have been officially implemented in pystac. How do you think these extensions are used? Hint: you can use them however way you like as long as you implement the specifications of those extensions correctly. You don't have to use pystac's built-in extensions facility specifically.

Wrong. With this, you lose all references to the STAC extensions you supposedly applied just before, and it gives you the false impression that your item data is valid, while it isn't. Point in hand, the results from the test from the current implementation did not generate valid STAC definitions, which I add to patch the missing stac_extensions reference: https://github.com/crim-ca/stac-populator/pull/38/files#diff-bad03ebcc2f6bef2d845f67ba6b5dbf05249d1c59ead566fabaed08862ae6735

Wrong. No references to STAC extensions are lost in that step because no extensions were ever defined prior to that step. All extensions are added afterwards, in CMIP6populator.create_stac_item, to the pystac.Item that is created in this step. The resulting CMIP6 STAC representations are valid because I have validated them against schemas for all extensions used (cmip6 and datacube). In fact it was during this validation step that i found the issues not only with the datacube extension that were recently fixed by David, but also in the CMIP6 proposed extension (TomAugspurger/cmip6#1).

You also seem to forget that the purpose of pystac is not only to generate data, but to validate and interact with it once you have retrieved it from the API. How are the users supposed to make sure the content is valid on their side and put it to use once they queried it from STAC API? Since everything was combined into a single STAC_item_from_metadata, they would have to piece it out anyway and parse everything again for their own use, just to extract the CMIP6 specific part. We shouldn't force a user that wants to use STAC with CMIP6 annotations to reimplemented its logic without mixing in THREDDS and populator logic, as it did previously. The pystac approach allows the user to apply gradually more extensions on top of the core Item as needed, without impacting other components.

I have actually tested the user-side interaction with pystac_client and I have not observed any problems. In these remarks, I don't see you talk about any concrete issues that you have encountered, it's all just vague hand-wavy argument. You have the previous implementation, so if you have an example of a specific issue that is present then bring it up and let's take it from there.

Not sure what you mean? All tests are passing since this morning.

INFO: [STACpopulator.populator_base  ] Creating STAC representation for tas_Amon_E3SM-1-0_historical_r9i2p2f1_gr_185001-201412.nc
  ERROR: [STACpopulator.stac_utils      ] Failed to add CMIP6 extension to item tas_Amon_E3SM-1-0_historical_r9i2p2f1_gr_185001-201412.nc
Traceback (most recent call last):
  File "/Users/dchandan/DACCS/Codes/stac-populator/STACpopulator/implementations/CMIP6_UofT/add_CMIP6.py", line 108, in <module>
    main()
  File "/Users/dchandan/DACCS/Codes/stac-populator/STACpopulator/implementations/CMIP6_UofT/add_CMIP6.py", line 104, in main
    return runner(ns)
  File "/Users/dchandan/DACCS/Codes/stac-populator/STACpopulator/implementations/CMIP6_UofT/add_CMIP6.py", line 98, in runner
    c.ingest()
  File "/Users/dchandan/DACCS/Codes/stac-populator/STACpopulator/populator_base.py", line 132, in ingest
    stac_item = self.create_stac_item(item_name, item_data)
  File "/Users/dchandan/DACCS/Codes/stac-populator/STACpopulator/implementations/CMIP6_UofT/add_CMIP6.py", line 49, in create_stac_item
    item = cmip_helper.stac_item()
  File "/Users/dchandan/DACCS/Codes/stac-populator/STACpopulator/extensions/cmip6.py", line 172, in stac_item
    item_cmip6 = CMIP6Extension.ext(item, add_if_missing=True)
  File "/Users/dchandan/DACCS/Codes/stac-populator/STACpopulator/extensions/cmip6.py", line 219, in ext
    cls.ensure_has_extension(obj, add_if_missing)
AttributeError: type object 'CMIP6Extension' has no attribute 'ensure_has_extension'. Did you mean: 'validate_has_extension'?
make: *** [test-cmip6] Error 1

@fmigneault
Copy link
Collaborator Author

I specifically add Tom's CMIP6 proposed extension

And yet, those were both the wrong URI, and not actually applied to the output result unless going through all the steps of CMIP6Populator. Another populator implementation that would want to reuse CMIP6Properties would have to repeat that logic again.

The resulting CMIP6 STAC representations are valid because I have validated them against schemas for all extensions

You validated the fields by providing the URI explicitly. An external tool could not automatically figure out the resolution between your CMIP6Properties and the URI. Using the pystac extensions, URIs can be resolved directly because they are registered under the pystac validator.

Note that I am not against using pydantic to define the core attributes. I directly reapplied your implementation of CMIP6Properties because that is more convenient. However, I generalized its use to be applicable for STAC Items, Assets and Collections. That does not mean that everyone implementing a STAC extension has to provide all of them, but I did since I was working on it anyway, and now it supports the entire schema from the Tom's CMIP6 definition, not just part of it.

The main difference of the implementation provided by this PR is that CMIP6Helper and THREDDSExtension are used to perform all the conversion logic from NCML to STAC CMIP6 properties. If I add another catalogue implementation than THREDDS, I can still validate and resuse CMIP6Extension against the metadata I provide by some other mean.

All extensions are added afterwards, in CMIP6populator.create_stac_item, to the pystac.Item that is created in this step.

This is exactly what I was referring to when mentioning the back-and-forth between pystac and pydantic. One is generated, then we shift to the other, then go back to pystac. That highlights that there is a fundamental issue in how data was managed, and re-validated over-and-over by setting their attributes on every conversion step. It also still forced the user to go through CMIP6populator.create_stac_item, therefore being stuck with the THREDDS and Populator logic when there is no need.

The process of converting the NCML attributes to STAC CMIP6 is one step (CMIP6Helper), parsing THREDDS attributes to STAC Asset references is another (which can now be reused for other NetCDF that does not include CMIP6 metadata btw), and validating both against reference CMIP6/THREDDS extension schema is the final step. Each of those steps can now be conveniently performed on their own, and the populator specifically combines them for a specific use case.

complexity of a pystac extension should be entertained only when the benefit outweighs that complexit

You think the following is complicated?

class CMIP6Extension(
    PropertiesExtension,
    ExtensionManagementMixin[pystac.Item],
):
    @property
    def name(self) -> Literal["cmip6"]:
        return "cmip6"

    def apply(
        self,
        properties: Union[CMIP6Properties, dict[str, Any]],
    ) -> None:
        if isinstance(properties, dict):
            properties = CMIP6Properties(**properties)
        data_json = json.loads(properties.model_dump_json(by_alias=True))
        for prop, val in data_json.items():
            self._set_property(prop, val)

    @classmethod
    def get_schema_uri(cls) -> str:
        return SCHEMA_URI

    @classmethod
    def ext(cls, obj: T, add_if_missing: bool = False) -> "CMIP6Extension":
        if isinstance(obj, pystac.Item):
            cls.ensure_has_extension(obj, add_if_missing)
            return ItemCMIP6Extension(obj)
        raise NotImplementedError()

class ItemCMIP6Extension(pystac.Item):
    def __init__(self, item: pystac.Item):
        self.item = item
        self.properties = item.properties

That is all that is needed to provide native support of pystac.Item.
Everything else from the PR is just moving things around that already existed to regroup THREDDS-specific and CMIP6-specific logic into their own classes.

Note again that the implementation I provide by this PR is bigger because I add additional support for pystac.Collection, pystac.Asset and tem_assets.AssetDefinition, but that is not a requirement to develop a new STAC extension.

@dchandan
Copy link
Collaborator

I'll get back to this thread tomorrow. Busy with other things atm.

@fmigneault
Copy link
Collaborator Author

@dchandan
I figured out your error. You do not have latest pystac. Please update it in your environment.
I will update the requirements to make sure minimum versions are applied for future installs.

@dchandan
Copy link
Collaborator

I think the way to proceed is to keep the CMIP6 pystac extension you have written for the CMIP6 data. The implementations that I write for other data products, I will continue to use a non-pystac extension approach (except for those where it may make sense to create an extension), and the two approaches will simply exist side-by-side. To do that I'll need to put back some of the code that was removed in this PR, but I will do that when I write those implementations, so nothing needs to be done about that here.

@dchandan dchandan closed this Dec 19, 2023
@dchandan dchandan reopened this Dec 19, 2023
@fmigneault
Copy link
Collaborator Author

@dchandan
Thank you for approving.
Is there something that could be done to facilitate the use case you have in mind?
For example, if the CMIP6Helper provided a from_pydantic method or similar to make the transition seamless regardless of the selected approach?

@dchandan
Copy link
Collaborator

dchandan commented Jan 4, 2024

@fmigneault
I've not had the chance to think about that lately, and I think it I will not get to it in the next two weeks. I don't want to delay this PR in the meantime, so I think if you are ready with this PR, we can merge this and then I can make proposals for changes in the future based on my needs arising from using a non pystac-extension approach.

@dchandan dchandan merged commit aed0c8e into master Jan 9, 2024
7 checks passed
@dchandan dchandan deleted the stac-ext-models branch January 17, 2024 16:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants