Skip to content

Conversation

@brycezhongqing
Copy link
Collaborator

@brycezhongqing brycezhongqing commented Sep 3, 2025

PR description

During the INDIS migration, we observed that many clients encountered various issues when connecting to INDIS. Troubleshooting these problems often consumed significant time from both customers and the INDIS team.

This PR addresses that pain point by introducing a pre-check function that runs before attempting to establish an INDIS connection. The pre-check introduces no side effects—it will not alter or break the normal workflow. Instead, it provides clear log messages that indicate which errors might prevent a successful INDIS connection.

With these logs, customers and the INDIS team can more easily identify root causes, accelerating troubleshooting and enabling customers to resolve issues on their own.

🚨 Pre-Check Failure Scenarios

  • Java version too low
  • Network connectivity issues
  • gRPC health check failures
  • Missing dependencies

Test Done

[✅]manully test
case1: Everything goes well:

2025/09/03 10:36:31.236 INFO [XdsClientImpl] [Indis xDS client executor-4-1] [indis-canary] [AAY96QuUvrF6rDL6zAFY4Q==] [xds pre-check] INDIS xDS server authority: main.indis-registry-observer.ei-ltx1.atd.disco.linkedin.com:32123
2025/09/03 10:36:31.236 INFO [XdsClientImpl] [Indis xDS client executor-4-1] [indis-canary] [AAY96QuUvrF6rDL6zAFY4Q==] [xds pre-check] Testing socket connection to INDIS xDS server: main.indis-registry-observer.ei-ltx1.atd.disco.linkedin.com:32123
2025/09/03 10:36:31.480 INFO [XdsClientImpl] [Indis xDS client executor-4-1] [indis-canary] [AAY96QuUvrF6rDL6zAFY4Q==] [xds pre-check] Successfully connected to INDIS xDS server at main.indis-registry-observer.ei-ltx1.atd.disco.linkedin.com:32123
2025/09/03 10:36:32.000 INFO [XdsClientImpl] [Indis xDS client executor-4-1] [indis-canary] [AAY96QuUvrF6rDL6zAFY4Q==] [xds pre-check] Health check for managed channel passed - channel is SERVING
2025/09/03 10:36:32.001 INFO [XdsClientImpl] [Indis xDS client executor-4-1] [indis-canary] [AAY96QuUvrF6rDL6zAFY4Q==] [xds pre-check] Protobuf Descriptor classes are available
2025/09/03 10:36:32.002 INFO [XdsClientImpl] [Indis xDS client executor-4-1] [indis-canary] [AAY96QuUvrF6rDL6zAFY4Q==] [xds pre-check] Envoy API classes are available
2025/09/03 10:36:32.002 INFO [XdsClientImpl] [Indis xDS client executor-4-1] [indis-canary] [AAY96QuUvrF6rDL6zAFY4Q==] [xds pre-check] Protobuf Descriptor and module dependency checks passed
2025/09/03 10:36:32.002 INFO [XdsClientImpl] [Indis xDS client executor-4-1] [indis-canary] [AAY96QuUvrF6rDL6zAFY4Q==] [xds pre-check] Managed channel is healthy and ready to use
2025/09/03 10:36:33.253 INFO [XdsClientImpl] [Grpc xDS Client Executor-13-1] [indis-canary] [AAY96QuziYpYykR0449H6w==] [xds pre-check] INDIS xDS server authority: main.indis-registry-observer.ei-ltx1.atd.disco.linkedin.com:32123
2025/09/03 10:36:33.253 INFO [XdsClientImpl] [Grpc xDS Client Executor-13-1] [indis-canary] [AAY96QuziYpYykR0449H6w==] [xds pre-check] Testing socket connection to INDIS xDS server: main.indis-registry-observer.ei-ltx1.atd.disco.linkedin.com:32123
2025/09/03 10:36:33.338 INFO [XdsClientImpl] [Grpc xDS Client Executor-13-1] [indis-canary] [AAY96QuziYpYykR0449H6w==] [xds pre-check] Successfully connected to INDIS xDS server at main.indis-registry-observer.ei-ltx1.atd.disco.linkedin.com:32123
2025/09/03 10:36:33.756 INFO [XdsClientImpl] [Grpc xDS Client Executor-13-1] [indis-canary] [AAY96QuziYpYykR0449H6w==] [xds pre-check] Health check for managed channel passed - channel is SERVING
2025/09/03 10:36:33.756 INFO [XdsClientImpl] [Grpc xDS Client Executor-13-1] [indis-canary] [AAY96QuziYpYykR0449H6w==] [xds pre-check] Protobuf Descriptor classes are available
2025/09/03 10:36:33.756 INFO [XdsClientImpl] [Grpc xDS Client Executor-13-1] [indis-canary] [AAY96QuziYpYykR0449H6w==] [xds pre-check] Envoy API classes are available
2025/09/03 10:36:33.756 INFO [XdsClientImpl] [Grpc xDS Client Executor-13-1] [indis-canary] [AAY96QuziYpYykR0449H6w==] [xds pre-check] Protobuf Descriptor and module dependency checks passed
2025/09/03 10:36:33.756 INFO [XdsClientImpl] [Grpc xDS Client Executor-13-1] [indis-canary] [AAY96QuziYpYykR0449H6w==] [xds pre-check] Managed channel is healthy and ready to use
2025/09/03 10:36:40.553 INFO [XdsClientImpl] [Indis xDS client executor-15-1] [indis-canary] [AAY96Qwi6qRJyxkHLVQbKQ==] [xds pre-check] INDIS xDS server authority: main.indis-registry-observer.ei-ltx1.atd.disco.linkedin.com:32123
2025/09/03 10:36:40.553 INFO [XdsClientImpl] [Indis xDS client executor-15-1] [indis-canary] [AAY96Qwi6qRJyxkHLVQbKQ==] [xds pre-check] Testing socket connection to INDIS xDS server: main.indis-registry-observer.ei-ltx1.atd.disco.linkedin.com:32123
2025/09/03 10:36:40.640 INFO [XdsClientImpl] [Indis xDS client executor-15-1] [indis-canary] [AAY96Qwi6qRJyxkHLVQbKQ==] [xds pre-check] Successfully connected to INDIS xDS server at main.indis-registry-observer.ei-ltx1.atd.disco.linkedin.com:32123
2025/09/03 10:36:41.115 INFO [XdsClientImpl] [Indis xDS client executor-15-1] [indis-canary] [AAY96Qwi6qRJyxkHLVQbKQ==] [xds pre-check] Health check for managed channel passed - channel is SERVING
2025/09/03 10:36:41.115 INFO [XdsClientImpl] [Indis xDS client executor-15-1] [indis-canary] [AAY96Qwi6qRJyxkHLVQbKQ==] [xds pre-check] Protobuf Descriptor classes are available
2025/09/03 10:36:41.115 INFO [XdsClientImpl] [Indis xDS client executor-15-1] [indis-canary] [AAY96Qwi6qRJyxkHLVQbKQ==] [xds pre-check] Envoy API classes are available
2025/09/03 10:36:41.115 INFO [XdsClientImpl] [Indis xDS client executor-15-1] [indis-canary] [AAY96Qwi6qRJyxkHLVQbKQ==] [xds pre-check] Protobuf Descriptor and module dependency checks passed
2025/09/03 10:36:41.115 INFO [XdsClientImpl] [Indis xDS client executor-15-1] [indis-canary] [AAY96Qwi6qRJyxkHLVQbKQ==] [xds pre-check] Managed channel is healthy and ready to use

case2: NACL issue:

2025/09/03 11:35:37.250 INFO [XdsClientImpl] [Indis xDS client executor-4-1] [indis-canary] [AAY96d7woRC2h5PagBm8/Q==] [xds pre-check] INDIS xDS server authority: main.indis-registry-observer.prod-ltx1.atd.disco.linkedin.com:32123
2025/09/03 11:35:37.250 INFO [XdsClientImpl] [Indis xDS client executor-4-1] [indis-canary] [AAY96d7woRC2h5PagBm8/Q==] [xds pre-check] Testing socket connection to INDIS xDS server: main.indis-registry-observer.prod-ltx1.atd.disco.linkedin.com:32123
2025/09/03 11:35:39.765 ERROR [XdsClientImpl] [Indis xDS client executor-4-1] [indis-canary] [AAY96d7woRC2h5PagBm8/Q==] [xds pre-check] Failed to connect to INDIS xDS server at authority main.indis-registry-observer.prod-ltx1.atd.disco.linkedin.com:32123: Connect timed out, check go/onboardindis Guidelines #5 and #7 for more details
2025/09/03 11:35:41.867 INFO [XdsClientImpl] [Grpc xDS Client Executor-13-1] [indis-canary] [AAY96d83E8/uDEJMGwyhQg==] [xds pre-check] INDIS xDS server authority: main.indis-registry-observer.prod-ltx1.atd.disco.linkedin.com:32123
2025/09/03 11:35:41.867 INFO [XdsClientImpl] [Grpc xDS Client Executor-13-1] [indis-canary] [AAY96d83E8/uDEJMGwyhQg==] [xds pre-check] Testing socket connection to INDIS xDS server: main.indis-registry-observer.prod-ltx1.atd.disco.linkedin.com:32123
2025/09/03 11:35:43.868 ERROR [XdsClientImpl] [Grpc xDS Client Executor-13-1] [indis-canary] [AAY96d83E8/uDEJMGwyhQg==] [xds pre-check] Failed to connect to INDIS xDS server at authority main.indis-registry-observer.prod-ltx1.atd.disco.linkedin.com:32123: Connect timed out, check go/onboardindis Guidelines #5 and #7 for more details
2025/09/03 11:35:49.096 INFO [XdsClientImpl] [Indis xDS client executor-15-1] [indis-canary] [AAY96d+lYr3FzkfvnYxWnQ==] [xds pre-check] INDIS xDS server authority: main.indis-registry-observer.prod-ltx1.atd.disco.linkedin.com:32123
2025/09/03 11:35:49.096 INFO [XdsClientImpl] [Indis xDS client executor-15-1] [indis-canary] [AAY96d+lYr3FzkfvnYxWnQ==] [xds pre-check] Testing socket connection to INDIS xDS server: main.indis-registry-observer.prod-ltx1.atd.disco.linkedin.com:32123
2025/09/03 11:35:51.097 ERROR [XdsClientImpl] [Indis xDS client executor-15-1] [indis-canary] [AAY96d+lYr3FzkfvnYxWnQ==] [xds pre-check] Failed to connect to INDIS xDS server at authority main.indis-registry-observer.prod-ltx1.atd.disco.linkedin.com:32123: Connect timed out, check go/onboardindis Guidelines #5 and #7 for more details

@brycezhongqing brycezhongqing marked this pull request as ready for review September 3, 2025 20:16
return;
}

if (!preCheckForIndisConnection())
Copy link
Contributor

@bohhyang bohhyang Sep 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pre-check should be run only ONCE at startup, instead of in every startRpcStreamLocal (every reconnect). After #1093 is merged, there will be a start() method in XdsClientImpl, so you can add it there (cleaner).
Or add it to startRpcStream().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, this precheck shouldn't be run on the executor but should be blocking xds client startup, so that later we can control whether to fail the app startup if the precheck fails.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I add some logic to make sure the precheck will only run one-time

@brycezhongqing brycezhongqing merged commit e23150a into master Sep 25, 2025
3 of 4 checks passed
@brycezhongqing brycezhongqing deleted the zhonchen/add_pre_check_for_xds_connection branch September 25, 2025 21:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants