Envoy does not use previously sent RouteConfiguration when initial_fetch_timeout value is changed inside Rds config #32283

jparklab · 2024-02-08T20:00:04Z

Title: Envoy does not use previously sent RouteConfiguration when initial_fetch_timeout value is changed inside Rds config

Description:

Based on Resource warming section on the envoy documentation, envoy is expected to use the previously sent RouteConfiguration while warming up a Listener and management does not need to send the RouteConfiguration if there is no change. However, when a field inside Rds field in the Listener, including initial_fetch_timeout field, is changed in a Listener, Envoy does not use the previously sent RouteConfiguration and wait for management server for the RouteConfiguration.

This can cause Envoy to time out while waiting for the RouteConfiguration, and finishes Listener warming without the RouteConfiguration. Once a Listener is warmed up without RouteCofiguration, Envoy responds to requests to the route with 404(NR) responses until it is restarted or the RouteConfiguration is updated and management server sends the updated RouteConfiguration to Envoy.

This happens because Envoy does not use existing_provider in https://github.com/envoyproxy/envoy/blob/v1.26.6/source/common/rds/route_config_provider_manager.cc#L82 if the hash value of rds configuration changes which prevents Envoy from using previously sent RouteConfiguration

Can we update Envoy to use existing_provider when initial_fetch_timeout value is changed? Although we do not need to change it often, we sometimes need to change the value, and we want to avoid restarting envoy proxies whenever we need to update initial_fetch_timeout value.

Repro steps:

This can be reproduced by running an envoy proxy that uses ADS to fetch configurations from a management server, and change initial_fetch_timeout value in ConfigSource in a listener. I have a simple management server to reproduce the issue, and can provide it if helps.

The text was updated successfully, but these errors were encountered:

ramaraochavali · 2024-02-09T08:48:11Z

Curious, what is the reason for dynamically changing init_fetch_timeout ?

jparklab · 2024-02-09T13:47:35Z

Curious, what is the reason for dynamically changing init_fetch_timeout ?

We have multiple envoy proxies as edge proxies connected to the same management server receiving configurations for more than a few thousands of services, and we see a large number of fetch timeouts when we restart the management server since all of the envoy proxies reconnects. We want to adjust init_fetch_timeout to avoid fetch timeouts (we are also considering rearchitecturing, however, that's a longer term goal for us).

The parameter will be updated when we release the change to the management server, and envoy proxies will get the updated value dynamically when it fetches updated listener configurations

ravenblackx · 2024-02-12T15:17:55Z

@alyssawilk I think (as codeowner on router)

alyssawilk · 2024-02-12T16:15:23Z

I suspect this is more of an RDS issue so tagging @adisuissa for thoughts

adisuissa · 2024-02-12T16:33:35Z

Yes, it seems that the identifier should be the unique resource name (+ what config-server used to serve it).
The fix will be creating a unique-ID given the proto instead of hashing the entire proto.

github-actions · 2024-03-13T20:01:17Z

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

srivatsav1998 · 2024-10-01T05:13:15Z

Hi, I am interested in working on this issue. Please let me know if this is still available.

I am new to this repository and would appreciate your sharing resources around the issue and starting pointers.

iamkishan98 · 2024-11-03T11:55:58Z

Hi @jparklab, I am new to open source, want to work on good first issue types. I would love to work on this issue, if still available.

cpakulski · 2024-12-31T16:58:34Z

Fix backported to:
1.32: #37749
1.31: #37844
1.30: #37846

jparklab added bug triage Issue requires triage labels Feb 8, 2024

ravenblackx added area/xds area/configuration and removed triage Issue requires triage labels Feb 9, 2024

github-actions bot added the stale stalebot believes this issue/PR has not been touched recently label Mar 13, 2024

adisuissa added no stalebot Disables stalebot from closing an issue beginner Good starter issues! and removed stale stalebot believes this issue/PR has not been touched recently labels Mar 13, 2024

cpakulski mentioned this issue Nov 15, 2024

rds: normalize rds provider's config before calculating hash #37180

Merged

adisuissa closed this as completed in #37180 Dec 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Envoy does not use previously sent RouteConfiguration when initial_fetch_timeout value is changed inside Rds config #32283

Envoy does not use previously sent RouteConfiguration when initial_fetch_timeout value is changed inside Rds config #32283

jparklab commented Feb 8, 2024

ramaraochavali commented Feb 9, 2024

jparklab commented Feb 9, 2024

ravenblackx commented Feb 12, 2024

alyssawilk commented Feb 12, 2024

adisuissa commented Feb 12, 2024

github-actions bot commented Mar 13, 2024

srivatsav1998 commented Oct 1, 2024

iamkishan98 commented Nov 3, 2024

cpakulski commented Dec 31, 2024 •

edited

Loading

Envoy does not use previously sent RouteConfiguration when initial_fetch_timeout value is changed inside Rds config #32283

Envoy does not use previously sent RouteConfiguration when initial_fetch_timeout value is changed inside Rds config #32283

Comments

jparklab commented Feb 8, 2024

ramaraochavali commented Feb 9, 2024

jparklab commented Feb 9, 2024

ravenblackx commented Feb 12, 2024

alyssawilk commented Feb 12, 2024

adisuissa commented Feb 12, 2024

github-actions bot commented Mar 13, 2024

srivatsav1998 commented Oct 1, 2024

iamkishan98 commented Nov 3, 2024

cpakulski commented Dec 31, 2024 • edited Loading

cpakulski commented Dec 31, 2024 •

edited

Loading