Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing a bug in thredds loader that limited crawling ability #44

Merged
merged 4 commits into from
Jan 22, 2024

Conversation

dchandan
Copy link
Collaborator

A subtle bug was limiting the number of files that could be successfully crawled by the crawler before the crawler abruptly stopped (without error). Investigation released that this had to do with the "depth" variable in the THREDDSLoader class, that was originally put in place to limit the depth to which the crawler will crawl. This original logic was largely retained, however, it turns out, there was an error in the logic which only showed up while crawling several deeply-nested data structures like the datasets in the UofT CMIP6 data catalog. Essentially, the depth variable was always linearly decreasing and when it became 0, the crawler stopped working. This PR fixes that issue.

STACpopulator/input.py Outdated Show resolved Hide resolved
STACpopulator/input.py Outdated Show resolved Hide resolved
STACpopulator/populator_base.py Outdated Show resolved Hide resolved
@@ -157,3 +159,5 @@ def ingest(self) -> None:
update=self.update,
session=self._session,
)
counter += 1
LOGGER.info(f"Processed {counter} data items")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be a warning log if the STAC item was not generated/returned? Possibility an unexpected error that was silently ignored?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good point. I've been wanting to add some more error management code. For the moment, I have added an error log message here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add the item name and loc to the log error message so we can debug the one that failed appropriately?

STACpopulator/input.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@huard huard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, sorry I missed this.

Copy link
Collaborator

@fmigneault fmigneault left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just last logging comment. Then, good to merge.

@@ -157,3 +159,5 @@ def ingest(self) -> None:
update=self.update,
session=self._session,
)
counter += 1
LOGGER.info(f"Processed {counter} data items")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add the item name and loc to the log error message so we can debug the one that failed appropriately?

@dchandan
Copy link
Collaborator Author

Good catch, sorry I missed this.

I missed it too!

@dchandan
Copy link
Collaborator Author

Just last logging comment. Then, good to merge.

I've added this to #45.

@dchandan dchandan merged commit 10b0716 into master Jan 22, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants