Add footprint finder code #127

snbianco · 2024-09-18T14:38:54Z

This is a draft PR that adds the footprint finder code from TESScut. Leaving as draft since I haven't added tests yet, but I'd welcome any feedback on the actual implementation.

I tried to make things as general as possible in terms of functions and variable names, but some things are more TESS specific and will need to be pulled out into a wrapper later when we generalize. Namely, _extract_sequence_information, _create_sequence_list, and _get_cube_files_from_sequence_obs are more TESS-specific as of now. The same is true about certain parts of cube_cut_from_footprint, mainly variables.

Something I was unsure about is how to best handle multiprocessing. The cube_cut_from_footprint function takes a threads parameter to use as max_workers when using cube_cut on the gathered cube files. However, the cube_cut function also takes its own threads parameter. Should these be two separate parameters to cube_cut_from_footprint, or the same? Should threads in cube_cut just be set to 'auto'?

astrocut/cube_cut_from_footprint.py

scfleming · 2024-09-18T15:21:24Z

My initial reaction is to say let's setup a single n_threads parameter for multi-threading of any function that can use it. While on paper it might be nice to say "use 8 threads for this one but 16 for that one", that sounds like over-engineering to me at this stage of the process, and it would be much simpler to have a single n_threads parameter used globally.

astrocut/cube_cut_from_footprint.py

falkben · 2024-09-18T19:49:51Z

astrocut/cube_cut_from_footprint.py

+
+    # Generate cutout from each cube file
+    cutout_files = []
+    if threads == 'auto' or threads > 1:


I am not sure threads here will help, and it may in fact hurt. cube_cut already uses threads for accessing the S3 data, so with this change, each cutout file would spawn that many threads. In my testing, there are diminishing returns after 8 threads. Since this could end up creating many times that number of threads, I expect we'd see thread contention here.

If you set threads to 0, versus setting threads to "auto" or "8" what are the results here?

Testing on my machine, I do see a performance improvement with a larger number of threads. It's more apparent when a sector isn't provided and more than 1 cutout is being generated. For example, these commands each generate 7 cutouts.

cube_cut_from_footprint('130 30', cutout_size=50, threads=0) --> 1 min, 23.4 sec
cube_cut_from_footprint('130 30', cutout_size=50, threads=8) --> 57.6 sec
cube_cut_from_footprint('130 30', cutout_size=50, threads='auto') --> 46.4 sec

is this before or after you made the change to use the same threads variable to pass into the cube_cut function?

Looks like I ran the test before, when threads was set to auto for cube_cut. When using the same threads variable, the call with threads=0 takes a lot longer, which makes sense. I also see that there is less than a second difference between threads=8 and threads=auto.

Maybe it would be best to keep threads for cube_cut constant at some value, like 'auto' or 8? I think that using threads in cube_cut_from_footprint is still worthwhile for the performance improvement when making several cutouts at once, but performance does seem to stagnate after a certain point.

I'm also thinking that the default value for threads in cube_cut_from_footprint should be set to 8 rather than 1, since performance is consistently better.

astrocut/__init__.py

falkben · 2024-09-18T19:59:04Z

astrocut/cube_cut_from_footprint.py

+    sequences : int, List[int], optional
+        Default None. Sequence(s) from which to generate cutouts. Can provide a single
+        sequence number as an int or a list of sequence numbers. If not specified, 
+        cutouts will be generated from all sequences that contain the cutout.


Maybe you were trying to generalize this but I'm not sure what sequences are. Are these sectors?

Yes, they refer to sectors! I was trying to make the parameter more general and borrowed "sequence" from the CAOM field descriptions: https://mast.stsci.edu/api/v0/_c_a_o_mfields.html

i guess for a user, it might be a little unclear that for TESS this is sectors, and not cameras or ccds or anything like that. so maybe some documentation or examples would help here.

Added some more info and an example to the docstring in the latest commit! I'll also be updating the documentation at some point (probably next sprint) and will definitely include examples there.

snbianco · 2024-09-24T23:04:16Z

I added unit tests for the module, but there seems to be a problem with accessing the public footprint files on S3 in the runners. From what I can find online, this is a permissions issue. The odd thing is, we have other tests that access S3 resources and work fine.

botocore.exceptions.ClientError: An error occurred (403) when calling the HeadObject operation: Forbidden

fix type annotation

Use json.load() documentation, set threads to 8 for cube_cut

snbianco · 2024-09-25T14:50:57Z

I added unit tests for the module, but there seems to be a problem with accessing the public footprint files on S3 in the runners. From what I can find online, this is a permissions issue. The odd thing is, we have other tests that access S3 resources and work fine.

botocore.exceptions.ClientError: An error occurred (403) when calling the HeadObject operation: Forbidden

I added a fixture to mock opening the footprint files with fsspec, and tests are passing now.

snbianco · 2024-09-26T19:34:02Z

Marking as ready as tests and documentation have been added.

falkben

There's an issue with accessing the footprint files in the bucket -- I have a ticket to work on that next week.

falkben · 2024-09-27T19:42:18Z

astrocut/footprint_cutouts.py

+        load_polys: Convert the s_region column to an array of SphericalPolygon objects
+    """
+    # Open footprint file with fsspec
+    # Use caching to help performance, but check that remote UID matches the local


is this comment still valid? I'm not seeing anything about UID in the code?

falkben · 2024-09-27T20:30:24Z

astrocut/footprint_cutouts.py

+    s3_cache = os.path.join(os.path.dirname(os.path.abspath(__file__)), 's3_cache')
+    with fsspec.open('filecache::' + s3_uri, s3={'anon': True},
+                     filecache={'cache_storage': s3_cache, 'check_files': True}) as f:


This will put the cache directory in the current directory of the module. This is a little bit of a weird place because when working on the project, it puts it into the current directory.

So, at a minimum we should add an entry for this in the .gitignore file to avoid committing to the repo.

It's also difficult for users to clean up, since it would likely be inside a nested directory inside a virtual environment once they've installed astrocut. I've seen programs put stuff like this in the current users home directory. In UNIX there's a variable XDG_CACHE_HOME that we could use to figure out where to put it. But we'd need to support Windows and Mac as well, and each of those platforms does something else.

How long do we want the cache to live? I'm wondering if a week is too long? I'm also having a hard time finding where that is documented (in fsspec or s3fs) or how to control it.

Or maybe we should just download the cache every time someone makes their first cutout (keep it in memory) and we continue to use it until they exit?

In tesscut, we use a TTL cache for this purpose: https://cachetools.readthedocs.io/en/latest/#cachetools.TTLCache

Here is the documentation on local caching in fsspec: https://filesystem-spec.readthedocs.io/en/latest/features.html#caching-files-locally

And here is the API for CachingFileSystem where the cache options are described: https://filesystem-spec.readthedocs.io/en/latest/api.html#fsspec.implementations.cached.CachingFileSystem

It takes less than a second to fetch the file from S3, so an in-memory store like TTLCache is probably the way to go. This is all great to know!

astrocut/footprint_cutouts.py

astrocut/tests/data/tess_ffi_footprints.json

astrocut/footprint_cutouts.py

falkben · 2024-09-27T21:13:41Z

astrocut/footprint_cutouts.py

+    if verbose:
+        print(f'Found {len(cube_files_mapping)} matching cube files.')
+    base_file_path = "s3://stpubdata/tess/public/mast/" if product == 'SPOC' \
+        else "s3://stpubdata/tess/public/mast/tica/"


Are there some situations where someone might still want to make cutouts from a different storage path?

Could we make this an option?

One instance where someone might want a different option is if they have downloaded the cube, to make direct cutouts. But in that case, maybe they are just directly using cube_cut?

Another option might be if they've mounted stpubdata cube data onto their machine or cloud environment (w/ fuse or something else) in which case they'd rather use that path than the s3 path. E.g. TIKE cloud platform?

I do think we can probably default to these paths, though.

Or maybe we just indicate that this function is only for making cutouts from s3 files? I guess it's already in the docstring, but it's not obvious from the module name or the function name.

My thought was that users would use cube_cut in the case that they already have the path (whether local or cloud) to a single cube file. I think that providing a single file kind of defeats the purpose of the footprint lookup.

A mounted filesystem or a local path to many cube files is worth considering, but I do wonder how common that use case would be. Could we guarantee that the cube files match the footprints coming from S3?

I'm inclined to rename the function to something like s3_cube_cut_from_footprint and make a new issue to explore other options at a later time.

maybe cloud_cube_cut_from_footprint though it is a bit long?

though thinking more on it, i think what you have also works. i don't think there's a need to make an issue now. we can wait until a use case comes up.

Do you think we could abbreviate and use cloud_cube_cut_from_fp? That may cause some confusion though since "fp" isn't too obvious.

between the two, i think i'd prefer just leaving off cloud. i'm not a big fan of acronym and fp isn't obvious

falkben · 2024-10-02T16:30:24Z

Since we're having a delay in opening up the cached footprint files on S3, another approach could be to download the footprint directly from CAOM through vo-tap interface on the first cutout.

This query gets the SPOC footprint table from CAOM and can be run from anywhere:

https://mast.stsci.edu/vo-tap/api/v0.1/caom/sync?FORMAT=json&LANG=ADQL&QUERY=SELECT+obs_id,+t_min,+t_max,+s_region,+target_name,+sequence_number+FROM+dbo.ObsPointing+WHERE+obs_collection=%27TESS%27+AND+dataproduct_type=%27image%27+AND+target_name=%27TESS+FFI%27

And this gets the TICA footprint (takes a bit longer since it's HLSP)

https://mast.stsci.edu/vo-tap/api/v0.1/caom/sync?FORMAT=json&LANG=ADQL&QUERY=SELECT%20obs_id,%20t_min,%20t_max,%20s_region,%20target_name,%20sequence_number%20FROM%20dbo.ObsPointing%20WHERE%20obs_collection=%27HLSP%27%20AND%20dataproduct_type=%27image%27%20AND%20target_name=%27TICA%20FFI%27

We manipulate that response in tesscut to create the footprint JSON file we store in S3 with a small bit of code but we could add that into astrocut as well.

Taking advantage of the cached footprint file in S3 is likely better long term, but we could use this method initially, for this PR

falkben · 2024-10-03T14:33:59Z

Do we want to revisit any of the mocking now that we can access the footprints?

astrocut/tests/data/tess_ffi_footprints.json

falkben · 2024-10-03T16:44:02Z

astrocut/footprint_cutouts.py

+    return np.vectorize(single_intersect)(ffi_list['polygon'], polygon)
+
+
+def _ra_dec_crossmatch(all_ffis: Table, coord: SkyCoord, cutout_size, arcsec_per_px: int):


I think it might be useful to provide this function (or one that gets the footprint for you as well) as an external interface. I could see this being useful for people who want more control over the cube_cut.

Could be done in a separate PR or issue if you don't want to do it here.

falkben · 2024-10-03T16:46:06Z

astrocut/footprint_cutouts.py

+
+from .utils.utils import parse_size_input
+
+TESS_ARCSEC_PER_PX = 21  # Number of arcseconds per pixel in a TESS image


we may want to make an issue or note to come back to this and generalize this somehow so it could be used for other missions. i think it could wait for now though, as I think TESS is the only mission we'd do this for.

falkben · 2024-10-03T16:56:18Z

astrocut/footprint_cutouts.py

+
+def _s_region_to_polygon(s_region: Column):
+    """
+    Takes in a s_region string of type POLYGON or CIRCLE and returns it as


i don't think this docstring is correct -- this function as currently written only supports POLYGON, not CIRCLE

falkben · 2024-10-03T16:58:49Z

astrocut/footprint_cutouts.py

+@cached(cache=FFI_TTLCACHE, lock=Lock())
+def _get_s3_ffis(s3_uri, as_table: bool = False, load_polys: bool = False):
+    """
+    Fetch the S3 footprint file containing a dict of all FFIs and a polygon column
+    that holds the s_regions as polygon points and vectors.
+
+    Optional Parameters:
+        as_table: Return the footprint file as an Astropy Table
+        load_polys: Convert the s_region column to an array of SphericalPolygon objects
+    """
+    # Open footprint file with fsspec
+    with fsspec.open(s3_uri, s3={'anon': True}) as f:
+        ffis = json.load(f)
+
+    if load_polys:
+        ffis['polygon'] = _s_region_to_polygon(ffis['s_region'])
+
+    if as_table:
+        ffis = Table(ffis)
+
+    return ffis


Should this function be removed, for now?

I figured that we could leave it in since we'll need it shortly, but it's true that we don't want users trying to call it while the bucket isn't accessible. I'll remove it for now.

falkben · 2024-10-03T17:10:03Z

I was looking for the coverage results.

Doesn't need to be handled in this PR, but it looks like there's a problem with codecov upload in the github action. From Python 3.10 with numpy 1.23 and full coverage job:

[2024-10-03T16:46:10.843Z] ['info'] => Project root located at: /home/runner/work/astrocut/astrocut
[2024-10-03T16:46:10.846Z] ['info'] -> No token specified or token is empty
[2024-10-03T16:46:10.936Z] ['info'] Searching for coverage files...
[2024-10-03T16:46:10.973Z] ['info'] => Found 1 possible coverage files:
  ./coverage.xml
[2024-10-03T16:46:10.973Z] ['info'] Processing ./coverage.xml...
[2024-10-03T16:46:10.976Z] ['info'] Detected GitHub Actions as the CI provider.
[2024-10-03T16:46:11.306Z] ['info'] Pinging Codecov: https://codecov.io/upload/v4?package=github-action-2.1.0-uploader-0.8.0&token=*******&branch=footprint-finder&build=11166073335&build_url=https%3A%2F%2Fgithub.com%2Fspacetelescope%2Fastrocut%2Factions%2Fruns%2F11166073335&commit=9552cbb9116481980b67cbea56921b10ebb327db&job=CI&pr=127&service=github-actions&slug=spacetelescope%2Fastrocut&name=&tag=&flags=&parent=
[2024-10-03T16:46:11.478Z] ['error'] There was an error running the uploader: Error uploading to [https://codecov.io:](https://codecov.io/) Error: There was an error fetching the storage URL during POST: 429 - {"message":"Rate limit reached. Please upload with the Codecov repository upload token to resolve issue. Expected time to availability: 366s."}

[2024-10-03T16:46:11.479Z] ['info'] Codecov will exit with status code 0. If you are expecting a non-zero exit code, please pass in the `-Z` flag

Anyways, coverage looks pretty good:

                                Stmts   Miss  Cover
---------------------------------------------------
astrocut/__init__.py               14      1    93%
astrocut/asdf_cutouts.py           82      3    96%
astrocut/cube_cut.py              388      4    99%
astrocut/cutout_processing.py     247     13    95%
astrocut/cutouts.py               244     26    89%
astrocut/exceptions.py             11      0   100%
astrocut/footprint_cutouts.py     139     11    92%
astrocut/make_cube.py             427     26    94%
astrocut/utils/__init__.py          0      0   100%
astrocut/utils/utils.py            86      4    95%
astrocut/utils/wcs_fitting.py      31      5    84%
---------------------------------------------------
                                 1669     93    94%

snbianco · 2024-10-03T19:47:55Z

I was looking for the coverage results.

Doesn't need to be handled in this PR, but it looks like there's a problem with codecov upload in the github action. From Python 3.10 with numpy 1.23 and full coverage job:

[2024-10-03T16:46:10.843Z] ['info'] => Project root located at: /home/runner/work/astrocut/astrocut
[2024-10-03T16:46:10.846Z] ['info'] -> No token specified or token is empty
[2024-10-03T16:46:10.936Z] ['info'] Searching for coverage files...
[2024-10-03T16:46:10.973Z] ['info'] => Found 1 possible coverage files:
  ./coverage.xml
[2024-10-03T16:46:10.973Z] ['info'] Processing ./coverage.xml...
[2024-10-03T16:46:10.976Z] ['info'] Detected GitHub Actions as the CI provider.
[2024-10-03T16:46:11.306Z] ['info'] Pinging Codecov: https://codecov.io/upload/v4?package=github-action-2.1.0-uploader-0.8.0&token=*******&branch=footprint-finder&build=11166073335&build_url=https%3A%2F%2Fgithub.com%2Fspacetelescope%2Fastrocut%2Factions%2Fruns%2F11166073335&commit=9552cbb9116481980b67cbea56921b10ebb327db&job=CI&pr=127&service=github-actions&slug=spacetelescope%2Fastrocut&name=&tag=&flags=&parent=
[2024-10-03T16:46:11.478Z] ['error'] There was an error running the uploader: Error uploading to [https://codecov.io:](https://codecov.io/) Error: There was an error fetching the storage URL during POST: 429 - {"message":"Rate limit reached. Please upload with the Codecov repository upload token to resolve issue. Expected time to availability: 366s."}

[2024-10-03T16:46:11.479Z] ['info'] Codecov will exit with status code 0. If you are expecting a non-zero exit code, please pass in the `-Z` flag

Anyways, coverage looks pretty good:

                                Stmts   Miss  Cover
---------------------------------------------------
astrocut/__init__.py               14      1    93%
astrocut/asdf_cutouts.py           82      3    96%
astrocut/cube_cut.py              388      4    99%
astrocut/cutout_processing.py     247     13    95%
astrocut/cutouts.py               244     26    89%
astrocut/exceptions.py             11      0   100%
astrocut/footprint_cutouts.py     139     11    92%
astrocut/make_cube.py             427     26    94%
astrocut/utils/__init__.py          0      0   100%
astrocut/utils/utils.py            86      4    95%
astrocut/utils/wcs_fitting.py      31      5    84%
---------------------------------------------------
                                 1669     93    94%

Made an issue here: https://jira.stsci.edu/browse/ASB-29119

snbianco added 3 commits September 15, 2024 22:21

progress on tesscut code

2c61ca6

send warnings, return array of filepaths

c6b3e0b

More generalization, print messages

509d766

snbianco commented Sep 18, 2024

View reviewed changes

astrocut/cube_cut_from_footprint.py Outdated Show resolved Hide resolved

comment fix

ccb34ca

snbianco requested review from falkben, havok2063 and dr-rodriguez September 18, 2024 14:56

falkben reviewed Sep 18, 2024

View reviewed changes

astrocut/cube_cut_from_footprint.py Outdated Show resolved Hide resolved

falkben reviewed Sep 18, 2024

View reviewed changes

astrocut/__init__.py Outdated Show resolved Hide resolved

falkben reviewed Sep 18, 2024

View reviewed changes

snbianco force-pushed the footprint-finder branch from b842116 to 9e51987 Compare September 24, 2024 22:52

snbianco added 4 commits September 25, 2024 08:53

style fixes

9726dc3

fix type annotation

rename module, use same thread number for cube_cut

2b29770

Use json.load() documentation, set threads to 8 for cube_cut

unit tests, don't allow cutout size of 0

3471090

fix example in docstring

735112e

snbianco force-pushed the footprint-finder branch from 9e51987 to 735112e Compare September 25, 2024 12:53

snbianco added 2 commits September 25, 2024 10:44

Use mock in unit tests

a9e0de7

rename mock

2f90a1e

documentation, fix cache position

31926a4

snbianco marked this pull request as ready for review September 26, 2024 19:32

falkben reviewed Sep 27, 2024

View reviewed changes

snbianco added 2 commits September 30, 2024 16:54

remove thread pool, filter ffi footprint files in test data

3a0a513

Add cachetools as requirement

3fb0051

work-around for footprint files using TAP

00ab68e

Remove mock from footprint tests

331148b

falkben reviewed Oct 3, 2024

View reviewed changes

astrocut/tests/data/tess_ffi_footprints.json Outdated Show resolved Hide resolved

Delete sample footprints

9552cbb

falkben reviewed Oct 3, 2024

View reviewed changes

remove cloud function, make crossmatch and ffi fetch functions public

5e2f889

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add footprint finder code #127

Add footprint finder code #127

snbianco commented Sep 18, 2024 •

edited

Loading

scfleming commented Sep 18, 2024 •

edited

Loading

falkben Sep 18, 2024

snbianco Sep 19, 2024 •

edited

Loading

falkben Sep 19, 2024

snbianco Sep 19, 2024 •

edited

Loading

falkben Sep 18, 2024

snbianco Sep 19, 2024

falkben Sep 19, 2024

snbianco Sep 19, 2024 •

edited

Loading

snbianco commented Sep 24, 2024

snbianco commented Sep 25, 2024

snbianco commented Sep 26, 2024

falkben left a comment

falkben Sep 27, 2024

falkben Sep 27, 2024

snbianco Sep 30, 2024

falkben Sep 27, 2024

snbianco Sep 30, 2024

falkben Sep 30, 2024

snbianco Sep 30, 2024

falkben Sep 30, 2024

falkben commented Oct 2, 2024

falkben commented Oct 3, 2024

falkben Oct 3, 2024

falkben Oct 3, 2024

falkben Oct 3, 2024

falkben Oct 3, 2024

snbianco Oct 3, 2024

falkben commented Oct 3, 2024

snbianco commented Oct 3, 2024

		return np.vectorize(single_intersect)(ffi_list['polygon'], polygon)


		def _ra_dec_crossmatch(all_ffis: Table, coord: SkyCoord, cutout_size, arcsec_per_px: int):


		from .utils.utils import parse_size_input

		TESS_ARCSEC_PER_PX = 21 # Number of arcseconds per pixel in a TESS image

Add footprint finder code #127

Are you sure you want to change the base?

Add footprint finder code #127

Conversation

snbianco commented Sep 18, 2024 • edited Loading

scfleming commented Sep 18, 2024 • edited Loading

Choose a reason for hiding this comment

snbianco Sep 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

snbianco Sep 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

snbianco Sep 19, 2024 • edited Loading

Choose a reason for hiding this comment

snbianco commented Sep 24, 2024

snbianco commented Sep 25, 2024

snbianco commented Sep 26, 2024

falkben left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

falkben commented Oct 2, 2024

falkben commented Oct 3, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

falkben commented Oct 3, 2024

snbianco commented Oct 3, 2024

snbianco commented Sep 18, 2024 •

edited

Loading

scfleming commented Sep 18, 2024 •

edited

Loading

snbianco Sep 19, 2024 •

edited

Loading

snbianco Sep 19, 2024 •

edited

Loading

snbianco Sep 19, 2024 •

edited

Loading