Smart oauth2 token handling for staging #906

mcolin16 · 2025-01-30T14:26:45Z

…a file

…ervices

github-advanced-security · 2025-01-30T15:11:51Z

This pull request sets up GitHub code scanning for this repository. Once the scans have completed and the checks have passed, the analysis results for this pull request branch will appear on this overview. Once you merge this pull request, the 'Security' tab will show more code scanning analysis results (for example, for the default branch). Depending on your configuration and choice of analysis tool, future pull requests will be annotated with code scanning analysis results. For more information about GitHub code scanning, check out the documentation.

… is expired

jgaucher-cs · 2025-02-14T12:33:01Z

services/common/rs_server_common/authentication/authentication_to_external.py

+                headers=prepare_headers(external_auth_config),
+            )
+            if response.status_code != HTTP_200_OK:
+                logger.error(


Could you use rs_server_common.utils.utils2.log_http_exception to avoid copy/pasting the message ?

…method

…main

…s_success

services/staging/poetry.lock

… the lock object

…if not already done + using the lock synchronization

…nd Lock objects

sonarqubecloud · 2025-03-10T16:16:52Z

Quality Gate failed

Failed conditions
1 Security Hotspot
3.2% Duplication on New Code (required ≤ 3%)
C Reliability Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

agrosu1978 · 2025-03-11T13:35:18Z

services/common/rs_server_common/authentication/authentication_to_external.py

@@ -46,6 +49,59 @@
 ACCESS_TK_KEY_IN_RESPONSE = "access_token"
 HEADER_CONTENT_TYPE = "application/x-www-form-urlencoded"

+# Mandatory attributed that should be present in the token dictionary


Maybe "attributed" should be replaced by "atributes"? There are two places where this word appears. Suggestion:
# Mandatory attributes that must be present in the token dictionary:
# Note: Additional attributes are included beyond those returned by the station response.
# These extra attributes include the creation date of both the access token and refresh token.

agrosu1978 · 2025-03-11T13:40:32Z

services/common/rs_server_common/authentication/authentication_to_external.py

+    Check if the token variable contains the mandatory attributes
+
+    Args:
+        token_dict (Any):


The rs-documentation will complain when the documentation is built with mkdocs:
token_dict (Any): -> add a description

agrosu1978 · 2025-03-11T13:40:56Z

services/common/rs_server_common/authentication/authentication_to_external.py

    """
    Retrieve and validate an authentication token for a specific station and service.
+    Thee are two main use cases:


Typo: "There"

agrosu1978 · 2025-03-11T13:44:21Z

services/common/rs_server_common/authentication/authentication_to_external.py

+        current_date = datetime.datetime.now()
+        diff_in_sec = (current_date - token_dict["access_token_creation_date"]).total_seconds()
+
+        logger.info(f"----------- DIFF VAUT: {diff_in_sec}")


Maybe a more meaningful message should be written in a logger.info. This is not a logger.debug

agrosu1978 · 2025-03-11T13:51:25Z

services/common/rs_server_common/s3_storage_handler/s3_storage_handler.py

        while attempt < max_retries:
            try:
                self.connect_s3()
                self.logger.info(f"Starting the streaming of {stream_url} to s3://{bucket}/{key}")
+
+                if not token_lock:


This is a protection against an input param, it should be put at the beginning of the function

agrosu1978 · 2025-03-11T13:55:48Z

services/staging/rs_server_staging/processors.py

+            staging_station_id: identifier of the station on which we want to stage data.
+            This variable will ve used to define the dask.distributed.Variable object used
+            to create an access_token shared by all of the Dask workers (there will be as many
+            shared variable as station involved in the staging process)


Typo. Suggestion: 'as the number of stations involved'

agrosu1978 · 2025-03-11T16:20:46Z

services/common/rs_server_common/s3_storage_handler/s3_storage_handler.py

+                    try:
+                        token_dict = token_info.get()
+                        # Get/refresh the access token if necessary
+                        token_dict = get_station_token(config, token_dict)


The dask Variable should be updated in the monitoring thread or a new thread created specially for this from the processing.py in the staging. Here, the token should be provided by the streaming task function that fetches the value from the shared dask Variable. The Lock should be also used within the streaming task function (from processors.py) and not here in the s3_storage_handler. This function should not be changed in fact. The former 'auth' param (from the previous implementation of this function) is in fact the updated token fetched and provided by the streaming task function. Reason for this comment: redundant refreshes across multiple workers / redundant running of the get_station_token function
See bellow the comment for streaming_task function in processors.py

agrosu1978 · 2025-03-11T16:28:59Z

services/staging/rs_server_staging/processors.py

        s3_handler = S3StorageHandler(
            os.environ["S3_ACCESSKEY"],
            os.environ["S3_SECRETKEY"],
            os.environ["S3_ENDPOINT"],
            os.environ["S3_REGION"],
        )
-        s3_handler.s3_streaming_upload(product_url, trusted_domains, auth, bucket, s3_file)
+        s3_handler.s3_streaming_upload(product_url, config, bucket, s3_file, token_dict, token_lock)


The token_dict is just fetched here from the shared variable. The token_dict should be refreshed by one thread only. This thread should be started before submitting all the dask tasks, similar to the monitoring tasks thread implemented in manage_dask_tasks_results function.
So here, the code should be something like:

try: # Create a thread lock to synchronize access to shared resources between the threads of a # given worker with token_lock: # fetch the current value of the Variable. This Variable is updated by one thread only # started in processors.py auth = token_dict.get("access_token") s3_handler = S3StorageHandler( os.environ["S3_ACCESSKEY"], os.environ["S3_SECRETKEY"], os.environ["S3_ENDPOINT"], os.environ["S3_REGION"], ) s3_handler.s3_streaming_upload(product_url, trusted_domains, auth, bucket, s3_file)

Please see the comment at line 901

LE: After reading some more documentation about the dask Lock and dask Variable and looking in the source code from dask Variable, I reached at the following conclusion:

dask Variable uses a centralized, atomic update mechanism, ensuring that .get() always returns a fully stored value (this is done by the dask scheduler). the usage of client.sync() function leaded me to this idea

locks are not needed for reading because .get() never returns a partially written state.

only writes need a lock to prevent multiple updates from conflicting. this should be discussed because normally only the thread from the processors.py should write it

So I guess that here (for dask variable reader) the token_lock should not be used in fact

agrosu1978 · 2025-03-11T17:02:13Z

services/staging/rs_server_staging/processors.py

-            dask_client: Client = self.dask_cluster_connect()
-            self.submit_tasks_to_dask_cluster(token, external_auth_config.trusted_domains, dask_client)
+            dask_client = self.dask_cluster_connect(external_auth_config.station_id)
+            self.submit_tasks_to_dask_cluster(external_auth_config, dask_client)


Before calling the submit function, a new thread should be started (similar to manage_dask_tasks_results):

dask_client = self.dask_cluster_connect(external_auth_config.station_id) try: await asyncio.to_thread(self.manage_token, self.external_auth_config, self.manage_token, self.token_lock, self.token_info) except Exception as e: # pylint: disable=broad-exception-caught self.log_job_execution(JobStatus.failed, 0, f"Error from token refreshment thread: {e}") self.submit_tasks_to_dask_cluster(external_auth_config, dask_client)

The thread should be something like this (pseudocode)

def manage_token(self, config, token_lock, token_info): self.logger.info("Starting the token refreshment logic") while (self.process_is_running): with token_lock: try: token_dict = token_info.get() # Get/refresh the access token if necessary token_dict = get_station_token(config, token_dict) token_info.set(token_dict) except ...... ........... time.sleep(TIME_TO_SLEEP)

The self.process_is_running is set to False by the monitoring thread manage_dask_tasks_results when it finishes
The reasons for my request are as follows:

centralized token refresh logic

reduces the eventual redundant requests since only this process refreshes the token

dask workers only need to read the token instead of managing the refresh logic

no changes in the s3_storage_handler module. the logic of this token refreshment should not be present there

ensures each task gets the latest token dynamically

important: one setter / multiple readers pattern. easier to manage and maintain the code.

We can discuss about this and of course, I can help you

agrosu1978 · 2025-03-12T08:22:32Z

services/common/rs_server_common/s3_storage_handler/s3_storage_handler.py

-            auth=auth,
-        )
-        prepared_request = session.prepare_request(request)
+


The retrying should be moved to the calling user level. The s3 botocore mechanism of retrying is enough. I will handle this

feat: call the function to get or refresh a token each time we stage …

8973d01

…a file

mcolin16 marked this pull request as draft January 30, 2025 14:26

mcolin16 self-assigned this Jan 30, 2025

mcolin16 changed the title ~~feat: call the function to get or refresh a token each time we stage …~~ Smart oauth2 token handling for staging Jan 30, 2025

fix: remove module pygeoapi from rs-server-cadip and rs-server-adgs s…

95f9bfe

…ervices

mcolinde added 4 commits February 4, 2025 18:57

feat: implement the shared token + locks synchronization mecanism

0e2d159

fix: add comments on locks

98695b8

fix: use only dask.distributed.lock + handle case where refresh token…

69aa125

… is expired

feat: add exception and update arguments

62639fc

jgaucher-cs reviewed Feb 14, 2025

View reviewed changes

mcolinde added 19 commits February 18, 2025 09:52

fix: correct parameters position in s3_streaming_upload_method

cdfed49

Merge branch 'develop' into feat/rspy561-smart-token-staging

bf19c67

Merge branch 'develop' into feat/rspy561-smart-token-staging

47758ee

fix: update poetry lock

ce7471c

test: test to remove to check connection to an existing dask cluster

969dc49

Merge branch 'develop' into feat/rspy561-smart-token-staging

c46a616

fix: update poetry lock

c3b3643

Merge branch 'develop' into feat/rspy561-smart-token-staging

8412b1a

fix test_streaming_task pytest + update docstring

b9ea141

fix: correct some pytests

6eb7181

fix: add mocks for dask.distributed Variable and Lock objects

85229eb

fix: some corrections on staging pytests

a7fb595

fix: correct test test_s3_streaming_upload

e7531f8

fix: update pytest test_get_station_token + fix on the corresponding …

4189049

…method

fix: fix test test_load_external_auth_config_by_domain_no_matching_do…

504dd1b

…main

Merge branch 'develop' into feat/rspy561-smart-token-staging

456655f

fix some pytest in rs-server-staging

6a4ebe6

fix: pre-commit pass

6d93931

feat: make time before tokens expiration configurable with env vars

b015333

mcolinde added 4 commits March 3, 2025 14:26

Merge branch 'develop' into feat/rspy561-smart-token-staging

b5011d6

fix: linter fixes

0521790

Merge branch 'develop' into feat/rspy561-smart-token-staging

848baac

fix: change log formatting + correct pytest test_process_rspy_feature…

cd262c3

…s_success

github-advanced-security bot found potential problems Mar 6, 2025

View reviewed changes

services/staging/poetry.lock Fixed Show fixed Hide fixed

mcolinde added 20 commits March 6, 2025 11:06

Merge branch 'develop' into feat/rspy561-smart-token-staging

229a98f

fix: pylint pass

65587f2

fix: pylint+flake8 corrections

2852018

fix: pylint+flake8 corrections

0443507

fix: correct failing pytests from rsserver

3de2bc5

fix: correct pytest test_app_lifespan_local_mode

ebfd65d

fix: put all dask.distributed.Variable read and write operation under…

2d9d955

… the lock object

fix: only initialize dask.distributed.Variable once at the beginning …

ef8de27

…if not already done + using the lock synchronization

fix: pylint correction

95293ef

fix: check if token_dict contains token key

96eba68

fix: pylint fix

21140ad

fix: pylint fix

e2b402a

fix: pylint fix

603d782

test: test ci without bugfix

66442c7

test: add non zero timeout to get the share token

fee3d07

fix: remove token init outside the lock

c13431e

fix: pylint fix

195e075

fix: correct failing pytest

48fa822

fix: specify the client when initializing dask.distributed Variable a…

ab31dd9

…nd Lock objects

Merge branch 'develop' into feat/rspy561-smart-token-staging

d6372dd

mcolin16 requested review from agrosu1978 and Padeanu March 11, 2025 13:15

agrosu1978 requested changes Mar 11, 2025

View reviewed changes

agrosu1978 reviewed Mar 12, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Smart oauth2 token handling for staging #906

Smart oauth2 token handling for staging #906

mcolin16 commented Jan 30, 2025

github-advanced-security bot commented Jan 30, 2025

jgaucher-cs Feb 14, 2025

sonarqubecloud bot commented Mar 10, 2025

agrosu1978 Mar 11, 2025

agrosu1978 Mar 11, 2025

agrosu1978 Mar 11, 2025

agrosu1978 Mar 11, 2025

agrosu1978 Mar 11, 2025

agrosu1978 Mar 11, 2025

agrosu1978 Mar 11, 2025

agrosu1978 Mar 11, 2025 •

edited

Loading

agrosu1978 Mar 11, 2025 •

edited

Loading

agrosu1978 Mar 12, 2025

Smart oauth2 token handling for staging #906

Are you sure you want to change the base?

Smart oauth2 token handling for staging #906

Conversation

mcolin16 commented Jan 30, 2025

github-advanced-security bot commented Jan 30, 2025

Choose a reason for hiding this comment

sonarqubecloud bot commented Mar 10, 2025

Quality Gate failed

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

agrosu1978 Mar 11, 2025 • edited Loading

Choose a reason for hiding this comment

agrosu1978 Mar 11, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

agrosu1978 Mar 11, 2025 •

edited

Loading

agrosu1978 Mar 11, 2025 •

edited

Loading