You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -25,18 +25,18 @@ documentation such as release notes or a development roadmap.
25
25
A good summary is probably at least a paragraph in length.
26
26
-->
27
27
28
+
This proposal adds DNS resolution support for Dual-Engine mode workloads, enabling seamless Istio migration by supporting ServiceEntry resources with DNS-based endpoints. Workloads dynamically resolve hostnames to IP addresses without static configuration.
29
+
28
30
### Motivation
29
31
30
32
<!--
31
33
This section is for explicitly listing the motivation, goals, and non-goals of
32
34
this KEP. Describe why the change is important and the benefits to users.
33
35
-->
34
36
35
-
In istio, [External Name service](https://kubernetes.io/docs/concepts/services-networking/service/#externalname) and DNS resolution typed [ServiceEntry](https://istio.io/latest/docs/reference/config/networking/service-entry/#ServiceEntry-Resolution) are widely used. For both kind of configs, istiod will generate associated DNS typed clusters.
36
-
37
-
So many people have depend on this kind services, Kmesh have to support it to make people migrate to it seamlessly.
37
+
In Istio, [ExternalName services](https://kubernetes.io/docs/concepts/services-networking/service/#externalname) and DNS-typed [ServiceEntry](https://istio.io/latest/docs/reference/config/networking/service-entry/#ServiceEntry-Resolution) rely on client-side DNS resolution. When istiod processes these configurations, it generates workloads without pre-resolved IP addresses.
38
38
39
-
Suppose we create a ServiceEntry like below:
39
+
Example ServiceEntry:
40
40
41
41
```yaml
42
42
apiVersion: networking.istio.io/v1
@@ -49,26 +49,26 @@ spec:
49
49
- news.google.com
50
50
ports:
51
51
- number: 80
52
-
number: http
52
+
name: http
53
53
protocol: HTTP
54
54
resolution: DNS
55
55
```
56
56
57
-
In the previous DNS processing, the CDS refresh process was included, resulting in a high degree of coupling between the DNS module and the kernel-native mode, making it impossible to reuse in dual-engine mode. In Release 1.1, Kmesh refactored the DNS module. Now, the data circulating in the DNS refresh queue is the domain name, rather than a structure containing CDS. Therefore, the DNS module no longer concerns itself with the mode of Kmesh and only provides the hostnames that need to be resolved. Specifically, the DNS logic of the Ads controller was extracted into a separate component called the Ads Dns Controller.
57
+
In Release 1.1, Kmesh refactored the DNS module, extracting DNS logic from the kernel-native Ads controller into a standalone `AdsDnsController`. This decoupling enables reuse in Dual-Engine mode.
58
58
59
59

60
60
61
-
We want to leverage this capability to build the DNS logic under Dual-Engine mode.
62
-
63
61
#### Goals
64
62
65
63
<!--
66
64
List the specific goals of the KEP. What is it trying to achieve? How will we
67
65
know that this has succeeded?
68
66
-->
69
67
70
-
- Support dns resolution of domain from workload generated by `ServiceEntry` in Dual-engine mode.
71
-
- Improve the existing DNS logic and further abstract a more general paradigm.
68
+
- Support DNS resolution for workloads generated from ServiceEntry resources in Dual-Engine mode
69
+
- Implement asynchronous DNS resolution to avoid blocking workload processing
70
+
- Provide automatic cleanup when workloads with DNS hostnames are removed
71
+
- Support both IPv4 and IPv6 address resolution
72
72
73
73
#### Non-Goals
74
74
@@ -90,7 +90,9 @@ The "Design Details" section below is for the real
90
90
nitty-gritty.
91
91
-->
92
92
93
-
Thanks to the `DNSResolver` in `pkg/dns` which extracted independent DNS resolution logic, we can reuse the DNS capabilities implemented in Dual-engine mode. To be more specific, `dnsController` was implemented (`pkg/controller/ads/dns.go`) and registered as a component of Ads Controller, which receives domains need to be resolved from the Processor (another component of Ads Controller) by a channel `DnsResolverChan`, so the `dnsController` can handle the DNS resolution asynchronously and save the resolved results (address) in a shared cache `AdsCache` to send them back to the Processor, along with flushing them to the BPF map.
93
+
Thanks to the `DNSResolver` in `pkg/dns` which extracted independent DNS resolution logic, we can reuse the DNS capabilities in Dual-engine mode. Inspired by the `AdsDnsController` in kernel-native mode, we implement a similar `WorkloadDnsController` to handle DNS resolution for workloads generated by `ServiceEntry` without address information.
94
+
95
+
The controller receives workloads needing DNS resolution from the Processor via a channel, handles the DNS resolution asynchronously, and sends the resolved results back to the Processor through per-workload result channels. This design ensures that DNS resolution does not block the workload processing pipeline.
94
96
95
97
### Design Details
96
98
@@ -101,103 +103,102 @@ required) or even code snippets. If there's any ambiguity about HOW your
101
103
proposal will be implemented, this is the place to discuss them.
102
104
-->
103
105
104
-
Inspired by the AdsDnsController in the Kernel-Native paradigm, we can implement a similar WorkloadDnsController in the Dual-Engine paradigm to handle the DNS resolution for the workloads generated by `ServiceEntry` and without address information. Some key structures and methods are shown below:
105
-
106
-
```go
107
-
type dnsController struct {
108
-
workloadsChan chan []*workloadapi.Workload
109
-
cache cache.WorkloadCache
110
-
dnsResolver *dns.DNSResolver
111
-
// store the copy of pendingResolveWorkload.
112
-
// key is the domain name, value is the pendingResolveDomain which contains workloads and refresh rate
113
-
workloadCache map[string]*pendingResolveDomain
114
-
// store all pending hostnames in the workloads
115
-
// key is the workload name, value is the list of related hostnames
116
-
pendingHostnames map[string][]string
117
-
sync.RWMutex
118
-
}
119
-
120
-
// pending resolve domain info of Dual-Engine Mode,
The DNS resolution flow in Dual-Engine mode follows this pattern:
195
109
196
-
Overall, the DNS logic under the Dual-engine mode is roughly illustrated in the block diagram below.
110
+
```txt
111
+
Processor → WorkloadDnsController → DNSResolver → Upstream DNS
112
+
↑ ↓
113
+
└────── Resolved Workload ────────────┘
114
+
```
197
115
198
116

199
117
200
-
As mentioned earlier, this implementation largely mirrors the DNS logic in AdsController, resulting in significant structural redundancy. Therefore, after the Step-1 we need to further abstract the current DNS logic. As shown in the diagram, both WorkerController and DnsResolver are already cohesive and self-contained; however, the interface layer highlighted in orange differs across modes. Subsequent refactoring will attempt to treat Cluster or Workload as first-class entities—load objects that Controller and DnsController (Resolver) interact with directly—thereby achieving looser coupling.
118
+
#### Key Components
119
+
120
+
##### WorkloadDnsController
121
+
122
+
The WorkloadDnsController manages asynchronous DNS resolution for ServiceEntry-generated workloads. Key data structures:
123
+
124
+
- **Resolution Queue**: Buffered channel receiving workloads from Processor for non-blocking submission
125
+
- **Result Channel Registry**: Per-workload result channels (indexed by UID) for independent resolution tracking
126
+
- **Domain Resolution Cache**: Bidirectional index between hostnames and pending workloads for batch resolution
127
+
128
+
The controller operates through three concurrent workers:
129
+
130
+
1. **Domain Processor**: Consumes workloads, groups by domain, delegates to DNS Resolver
131
+
2. **Refresh Worker**: Receives resolved addresses, constructs workload objects, delivers via result channels
132
+
3. **DNS Resolver**: Executes DNS queries (A/AAAA), maintains TTL-based cache (reused from `pkg/dns`)
133
+
134
+
##### Resolution Flow
135
+
136
+
The DNS resolution mechanism follows a producer-consumer pattern with timeout protection:
137
+
138
+
**Workload Submission**: When the Processor encounters a workload without addresses, it registers a result channel and enqueues the workload for resolution. The Processor blocks on the result channel with a 3-second timeout to prevent pipeline blocking.
139
+
140
+
**Domain Aggregation**: The Domain Processor maintains a hostname-indexed cache to aggregate workloads requiring the same domain, reducing redundant DNS queries. It checks for cached resolutions before initiating new queries.
141
+
142
+
**Address Resolution**: The DNS Resolver performs parallel IPv4 (A) and IPv6 (AAAA) queries, respecting DNS TTL values. Resolved addresses are stored in a protocol-agnostic format.
143
+
144
+
**Result Distribution**: The Refresh Worker reconstructs workload objects with resolved addresses and delivers them via result channels. Channel operations include a 100ms send timeout to prevent deadlocks. After delivery, the controller removes the result channel registration to prevent memory leaks.
145
+
146
+
##### Cleanup Mechanism
147
+
148
+
When a workload is deleted, the controller:
149
+
150
+
- Removes the workload from pending hostname tracking
151
+
- Removes the workload from the hostname's pending domain cache
152
+
- If no more workloads depend on the hostname, unwatches the domain from DNS resolver
153
+
154
+
This ensures no memory leaks and prevents unnecessary DNS queries.
155
+
156
+
##### Design Rationale
157
+
158
+
The WorkloadDnsController design diverges from AdsDnsController in several key aspects to better accommodate workload-level resolution requirements:
159
+
160
+
| Design Decision | Approach | Rationale |
161
+
|----------------|----------|-----------|
162
+
| **Result Delivery** | Dedicated per-workload channels | Eliminates result filtering overhead and spurious wake-ups; enables direct workload-specific blocking without cross-workload interference |
| **Address Format** | `netip.ParseAddr().AsSlice()` byte representation | Provides protocol-agnostic representation supporting both IPv4 and IPv6 without conditional logic |
165
+
| **Refresh Interval** | Fixed 200ms rate | Simplifies implementation while maintaining adequate freshness for typical ServiceEntry use cases; trades configurability for consistency |
166
+
167
+
#### Integration Points
168
+
169
+
The WorkloadDnsController integrates into the Kmesh control plane through two primary integration points:
170
+
171
+
| Component | Integration Method | Lifecycle |
172
+
|-----------|-------------------|-----------|
173
+
| **WorkloadController** | Instantiated during `NewController()` | Started via `Run()` context; shutdown via context cancellation |
174
+
| **Processor** | Reference-based invocation | Synchronous blocking on resolution for address-less workloads; ensures data consistency before processing |
| **Cleanup** | On cluster removal | On workload removal |
201
202
202
203
#### Test Plan
203
204
@@ -211,6 +212,16 @@ that would count as tricky in the implementation, and anything particularly
211
212
challenging to test, should be called out.
212
213
-->
213
214
215
+
**Unit Tests**
216
+
217
+
DNS controller unit tests cover IPv4, IPv6, and dual-stack resolution scenarios, workload address overwriting logic, cleanup on workload deletion, and concurrent resolution of multiple workloads.
218
+
219
+
**E2E Tests**
220
+
221
+
E2E test validates the end-to-end ServiceEntry DNS resolution flow by creating a ServiceEntry with DNS resolution pointing to a fake hostname, creating a VirtualService routing the fake hostname to a real service, and verifying traffic flows successfully.
222
+
223
+
Note: DNS proxy is disabled in IPv6-only environments, so tests skip in that configuration.
224
+
214
225
### Alternatives
215
226
216
227
<!--
@@ -219,6 +230,12 @@ not need to be as detailed as the proposal, but should include enough
219
230
information to express the idea and why it was not acceptable.
220
231
-->
221
232
233
+
**Synchronous Resolution in Processor**: Embedding DNS resolution directly within the Processor's main workload handling loop would introduce blocking behavior, eliminating the possibility of batching concurrent resolutions for identical domains and complicating timeout implementation. This approach violates the separation of concerns principle by coupling network I/O with workload state management.
234
+
235
+
**Shared Result Channel**: Reusing AdsDnsController's single shared channel pattern would necessitate complex result filtering logic to match responses with their corresponding workload requests. The additional synchronization overhead and potential for spurious wake-ups make this approach less suitable for workload-level granularity.
236
+
237
+
**Kernel-Space DNS Resolution**: Implementing DNS resolution within eBPF programs would require reimplementing the DNS protocol stack in a constrained execution environment with strict complexity limits. This approach would duplicate existing userspace functionality, significantly increase maintenance burden, and provide minimal performance benefits given the infrequent nature of DNS lookups.
238
+
222
239
<!--
223
240
Note: This is a simplified version of kubernetes enhancement proposal template.
0 commit comments