-
Notifications
You must be signed in to change notification settings - Fork 132
Proposal: Dns support for Dual-Engine #1436
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
kmesh-bot
merged 3 commits into
kmesh-net:main
from
Flying-Tom:proposal-dual-engine-dns
Oct 31, 2025
Merged
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,242 @@ | ||
| --- | ||
| title: Support DNS Resolution in Dual Engine Mode | ||
| authors: | ||
| - "@Flying-Tom" | ||
| reviewers: | ||
| - | ||
| approvers: | ||
| - | ||
| creation-date: 2025-07-09 | ||
| --- | ||
|
|
||
| ## Support DNS Resolution in Dual Engine Mode | ||
|
|
||
| <!-- | ||
| This is the title of your KEP. Keep it short, simple, and descriptive. A good | ||
| title can help communicate what the KEP is and should be considered as part of | ||
| any review. | ||
| --> | ||
|
|
||
| ### Summary | ||
|
|
||
| <!-- | ||
| This section is incredibly important for producing high-quality, user-focused | ||
| documentation such as release notes or a development roadmap. | ||
| A good summary is probably at least a paragraph in length. | ||
| --> | ||
|
|
||
| This proposal adds DNS resolution support for Dual-Engine mode workloads, enabling seamless Istio migration by supporting ServiceEntry resources with DNS-based endpoints. Workloads dynamically resolve hostnames to IP addresses without static configuration. | ||
|
|
||
| ### Motivation | ||
|
|
||
| <!-- | ||
| This section is for explicitly listing the motivation, goals, and non-goals of | ||
| this KEP. Describe why the change is important and the benefits to users. | ||
| --> | ||
|
|
||
| In Istio, [ExternalName services](https://kubernetes.io/docs/concepts/services-networking/service/#externalname) and DNS-typed [ServiceEntry](https://istio.io/latest/docs/reference/config/networking/service-entry/#ServiceEntry-Resolution) rely on client-side DNS resolution. When istiod processes these configurations, it generates workloads without pre-resolved IP addresses. | ||
|
|
||
| Example ServiceEntry: | ||
|
|
||
| ```yaml | ||
| apiVersion: networking.istio.io/v1 | ||
| kind: ServiceEntry | ||
| metadata: | ||
| name: external-svc-google | ||
| namespace: default | ||
| spec: | ||
| hosts: | ||
| - news.google.com | ||
| ports: | ||
| - number: 80 | ||
| name: http | ||
| protocol: HTTP | ||
| resolution: DNS | ||
| ``` | ||
|
|
||
| In Release 1.1, Kmesh refactored the DNS module, extracting DNS logic from the kernel-native Ads controller into a standalone `AdsDnsController`. This decoupling enables reuse in Dual-Engine mode. | ||
|
|
||
|  | ||
Flying-Tom marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| #### Goals | ||
|
|
||
| <!-- | ||
| List the specific goals of the KEP. What is it trying to achieve? How will we | ||
| know that this has succeeded? | ||
| --> | ||
|
|
||
| - Support DNS resolution for workloads generated from ServiceEntry resources in Dual-Engine mode | ||
| - Implement asynchronous DNS resolution to avoid blocking workload processing | ||
| - Provide automatic cleanup when workloads with DNS hostnames are removed | ||
| - Support both IPv4 and IPv6 address resolution | ||
|
|
||
| #### Non-Goals | ||
|
|
||
| <!-- | ||
| What is out of scope for this KEP? Listing non-goals helps to focus discussion | ||
| and make progress. | ||
| --> | ||
Flying-Tom marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| This KEP does not aim to implement or provide a DNS proxy or DNS server functionality. Specifically, we do not support resolving DNS names on behalf of client workloads. As a result, if a `ServiceEntry` uses a non-resolvable or fake DNS domain, client workloads may fail to resolve and access the intended service. Handling such DNS resolution scenarios is explicitly out of scope for this proposal. | ||
|
|
||
| ### Proposal | ||
|
|
||
| <!-- | ||
| This is where we get down to the specifics of what the proposal actually is. | ||
| This should have enough detail that reviewers can understand exactly what | ||
| you're proposing, but should not include things like API designs or | ||
| implementation. What is the desired outcome and how do we measure success?. | ||
| The "Design Details" section below is for the real | ||
| nitty-gritty. | ||
| --> | ||
|
|
||
| Thanks to the `DNSResolver` in `pkg/dns` which extracted independent DNS resolution logic, we can reuse the DNS capabilities in Dual-engine mode. Inspired by the `AdsDnsController` in kernel-native mode, we implement a similar `WorkloadDnsController` to handle DNS resolution for workloads generated by `ServiceEntry` without address information. | ||
|
|
||
| The controller receives workloads needing DNS resolution from the Processor via a channel, handles the DNS resolution asynchronously, and sends the resolved results back to the Processor through per-workload result channels. This design ensures that DNS resolution does not block the workload processing pipeline. | ||
|
|
||
| ### Design Details | ||
|
|
||
| <!-- | ||
| This section should contain enough information that the specifics of your | ||
| change are understandable. This may include API specs (though not always | ||
| required) or even code snippets. If there's any ambiguity about HOW your | ||
| proposal will be implemented, this is the place to discuss them. | ||
| --> | ||
|
|
||
| #### Architecture Overview | ||
|
|
||
| The DNS resolution flow in Dual-Engine mode follows this pattern: | ||
|
|
||
| ```txt | ||
| Processor → WorkloadDnsController → DNSResolver → Upstream DNS | ||
| ↑ ↓ | ||
| └────── Resolved Workload ────────────┘ | ||
| ``` | ||
|
|
||
|  | ||
Flying-Tom marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| #### Key Components | ||
|
|
||
| ##### WorkloadDnsController | ||
|
|
||
| The WorkloadDnsController manages asynchronous DNS resolution for ServiceEntry-generated workloads. Key data structures: | ||
|
|
||
| - **Resolution Queue**: Buffered channel receiving workloads from Processor for non-blocking submission | ||
| - **Result Channel Registry**: Per-workload result channels (indexed by UID) for independent resolution tracking | ||
| - **Domain Resolution Cache**: Bidirectional index between hostnames and pending workloads for batch resolution | ||
|
|
||
| The controller operates through three concurrent workers: | ||
|
|
||
| 1. **Domain Processor**: Consumes workloads, groups by domain, delegates to DNS Resolver | ||
| 2. **Refresh Worker**: Receives resolved addresses, constructs workload objects, delivers via result channels | ||
| 3. **DNS Resolver**: Executes DNS queries (A/AAAA), maintains TTL-based cache (reused from `pkg/dns`) | ||
|
|
||
| ##### Resolution Flow | ||
|
|
||
| The DNS resolution mechanism follows a producer-consumer pattern with timeout protection: | ||
|
|
||
| **Workload Submission**: When the Processor encounters a workload without addresses, it registers a result channel and enqueues the workload for resolution. The Processor blocks on the result channel with a 3-second timeout to prevent pipeline blocking. | ||
|
|
||
| **Domain Aggregation**: The Domain Processor maintains a hostname-indexed cache to aggregate workloads requiring the same domain, reducing redundant DNS queries. It checks for cached resolutions before initiating new queries. | ||
|
|
||
| **Address Resolution**: The DNS Resolver performs parallel IPv4 (A) and IPv6 (AAAA) queries, respecting DNS TTL values. Resolved addresses are stored in a protocol-agnostic format. | ||
|
|
||
| **Result Distribution**: The Refresh Worker reconstructs workload objects with resolved addresses and delivers them via result channels. Channel operations include a 100ms send timeout to prevent deadlocks. After delivery, the controller removes the result channel registration to prevent memory leaks. | ||
|
|
||
| ##### Cleanup Mechanism | ||
|
|
||
| When a workload is deleted, the controller: | ||
|
|
||
| - Removes the workload from pending hostname tracking | ||
| - Removes the workload from the hostname's pending domain cache | ||
| - If no more workloads depend on the hostname, unwatches the domain from DNS resolver | ||
|
|
||
| This ensures no memory leaks and prevents unnecessary DNS queries. | ||
|
|
||
| ##### Design Rationale | ||
|
|
||
| The WorkloadDnsController design diverges from AdsDnsController in several key aspects to better accommodate workload-level resolution requirements: | ||
|
|
||
| | Design Decision | Approach | Rationale | | ||
| |----------------|----------|-----------| | ||
| | **Result Delivery** | Dedicated per-workload channels | Eliminates result filtering overhead and spurious wake-ups; enables direct workload-specific blocking without cross-workload interference | | ||
| | **Timeout Strategy** | Two-tier mechanism (3s processor + 100ms channel) | Processor-level timeout prevents indefinite pipeline blocking; channel-level timeout prevents deadlocks from abandoned consumers | | ||
| | **Address Format** | `netip.ParseAddr().AsSlice()` byte representation | Provides protocol-agnostic representation supporting both IPv4 and IPv6 without conditional logic | | ||
| | **Refresh Interval** | Fixed 200ms rate | Simplifies implementation while maintaining adequate freshness for typical ServiceEntry use cases; trades configurability for consistency | | ||
|
|
||
| #### Integration Points | ||
|
|
||
| The WorkloadDnsController integrates into the Kmesh control plane through two primary integration points: | ||
|
|
||
| | Component | Integration Method | Lifecycle | | ||
| |-----------|-------------------|-----------| | ||
| | **WorkloadController** | Instantiated during `NewController()` | Started via `Run()` context; shutdown via context cancellation | | ||
| | **Processor** | Reference-based invocation | Synchronous blocking on resolution for address-less workloads; ensures data consistency before processing | | ||
|
|
||
| **Initialization Sequence**: | ||
|
|
||
| 1. WorkloadController creates WorkloadDnsController instance | ||
| 2. DnsController reference stored in Processor | ||
| 3. DnsController goroutines started when WorkloadController.Run() is invoked | ||
| 4. Lifecycle bound to WorkloadController's context | ||
|
|
||
| **Runtime Interaction**: | ||
|
|
||
| 1. Processor detects workload without addresses (ServiceEntry-originated) | ||
| 2. Processor registers result channel in DnsController | ||
| 3. Processor submits workload to resolution queue | ||
| 4. Processor blocks on result channel with timeout | ||
| 5. DnsController delivers resolved workload or timeout expires | ||
| 6. Processor continues with workload processing | ||
|
|
||
| #### Comparison with AdsDnsController | ||
|
|
||
| | Aspect | AdsDnsController | WorkloadDnsController | | ||
| |--------|------------------|----------------------| | ||
| | **Input** | Clusters with DNS endpoints | Workloads without addresses | | ||
| | **Processing Unit** | Cluster | Workload | | ||
| | **Result Delivery** | Shared channel | Per-workload channels | | ||
| | **Timeout** | No processor timeout | 3s processor + 100ms channel | | ||
| | **Refresh Rate** | From cluster config | Fixed 200ms | | ||
| | **Cleanup** | On cluster removal | On workload removal | | ||
|
|
||
| #### Test Plan | ||
|
|
||
| <!-- | ||
| **Note:** *Not required until targeted at a release.* | ||
| Consider the following in developing a test plan for this enhancement: | ||
| - Will there be e2e and integration tests, in addition to unit tests? | ||
| - How will it be tested in isolation vs with other components? | ||
| No need to outline all test cases, just the general strategy. Anything | ||
| that would count as tricky in the implementation, and anything particularly | ||
| challenging to test, should be called out. | ||
| --> | ||
|
|
||
| **Unit Tests** | ||
|
|
||
| DNS controller unit tests cover IPv4, IPv6, and dual-stack resolution scenarios, workload address overwriting logic, cleanup on workload deletion, and concurrent resolution of multiple workloads. | ||
|
|
||
| **E2E Tests** | ||
|
|
||
| E2E test validates the end-to-end ServiceEntry DNS resolution flow by creating a ServiceEntry with DNS resolution pointing to a fake hostname, creating a VirtualService routing the fake hostname to a real service, and verifying traffic flows successfully. | ||
|
|
||
| Note: DNS proxy is disabled in IPv6-only environments, so tests skip in that configuration. | ||
|
|
||
| ### Alternatives | ||
|
|
||
| <!-- | ||
| What other approaches did you consider, and why did you rule them out? These do | ||
| not need to be as detailed as the proposal, but should include enough | ||
| information to express the idea and why it was not acceptable. | ||
| --> | ||
|
|
||
| **Synchronous Resolution in Processor**: Embedding DNS resolution directly within the Processor's main workload handling loop would introduce blocking behavior, eliminating the possibility of batching concurrent resolutions for identical domains and complicating timeout implementation. This approach violates the separation of concerns principle by coupling network I/O with workload state management. | ||
|
|
||
| **Shared Result Channel**: Reusing AdsDnsController's single shared channel pattern would necessitate complex result filtering logic to match responses with their corresponding workload requests. The additional synchronization overhead and potential for spurious wake-ups make this approach less suitable for workload-level granularity. | ||
|
|
||
| **Kernel-Space DNS Resolution**: Implementing DNS resolution within eBPF programs would require reimplementing the DNS protocol stack in a constrained execution environment with strict complexity limits. This approach would duplicate existing userspace functionality, significantly increase maintenance burden, and provide minimal performance benefits given the infrequent nature of DNS lookups. | ||
|
|
||
| <!-- | ||
| Note: This is a simplified version of kubernetes enhancement proposal template. | ||
| https://github.com/kubernetes/enhancements/tree/3317d4cb548c396a430d1c1ac6625226018adf6a/keps/NNNN-kep-template | ||
| --> | ||
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.