Skip to content

Conversation

rodrigozhou
Copy link
Contributor

@rodrigozhou rodrigozhou commented Sep 8, 2025

What changed?

If the visibility manager's search attribute cache is cold (ie., never loaded yet), and reading cluster metadata failed (Unavailable error), then retry randomly within 5 seconds instead of the default 20 seconds.

Why?

If the cache is cold and there are custom search attributes, then reading or writing workflows with custom search attributes would fail for 20 seconds. Retrying within 5 seconds to minimize the impact.

How did you test it?

  • built
  • run locally and tested manually
  • covered by existing tests
  • added new unit test(s)
  • added new functional test(s)

Potential risks

@rodrigozhou rodrigozhou requested a review from a team as a code owner September 8, 2025 18:19
@rodrigozhou rodrigozhou requested a review from yiminc September 8, 2025 18:19
@rodrigozhou rodrigozhou changed the title Revamp Visibility Manager search attributes cache Retry reloading Visibility Manager search attributes cache when it's cold Sep 8, 2025
@rodrigozhou rodrigozhou changed the title Retry reloading Visibility Manager search attributes cache when it's cold Retry reloading Visibility Manager search attributes cache when cold Sep 8, 2025
saCache.expireOn = now.Add(cacheRefreshIfUnavailableInterval)
default:
return saCache, err
if saCache.dbVersion == 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If cache is cold, 1) default refresh interval should be much shorter, 1s or even shorter than that. 2) we should retry on all errors if cache is code, not just Unavailable error, like resource exhausted or timeout errors.

Copy link
Contributor Author

@rodrigozhou rodrigozhou Sep 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reduced the interval to 1 second.
If there are any other error besides NotFound or Unavailable, it will retry immediately no matter if the cache is cold or not. This is the existing logic, and I didn't change.

@rodrigozhou rodrigozhou requested a review from yiminc September 8, 2025 23:06
@rodrigozhou rodrigozhou enabled auto-merge (squash) September 8, 2025 23:18
@rodrigozhou rodrigozhou disabled auto-merge September 9, 2025 00:24
@rodrigozhou rodrigozhou force-pushed the rodrigozhou/vis-sa-cache branch from b8fec1e to 180f321 Compare September 9, 2025 08:11
@rodrigozhou rodrigozhou enabled auto-merge (squash) September 9, 2025 08:20
@rodrigozhou rodrigozhou merged commit 707b2aa into main Sep 9, 2025
95 of 97 checks passed
@rodrigozhou rodrigozhou deleted the rodrigozhou/vis-sa-cache branch September 9, 2025 18:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants