Skip to content

Commit fcf5ce2

Browse files
committed
proposal init
Signed-off-by: Tom <[email protected]>
1 parent 01c10a6 commit fcf5ce2

File tree

3 files changed

+230
-0
lines changed

3 files changed

+230
-0
lines changed

.markdownlint.yml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,3 +6,6 @@ ol-prefix: false
66
no-duplicate-heading: false
77
single-h1: false
88
no-emphasis-as-heading: false
9+
no-hard-tabs:
10+
code_blocks: true
11+
spaces_per_tab: 2

docs/proposal/dual_engine_dns.md

Lines changed: 227 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,227 @@
1+
---
2+
title: Support DNS Resolution in Dual Engine Mode
3+
authors:
4+
- "@Flying-Tom"
5+
reviewers:
6+
-
7+
approvers:
8+
-
9+
creation-date: 2025-07-09
10+
---
11+
12+
## Support DNS Resolution in Dual Engine Mode
13+
14+
<!--
15+
This is the title of your KEP. Keep it short, simple, and descriptive. A good
16+
title can help communicate what the KEP is and should be considered as part of
17+
any review.
18+
-->
19+
20+
### Summary
21+
22+
<!--
23+
This section is incredibly important for producing high-quality, user-focused
24+
documentation such as release notes or a development roadmap.
25+
A good summary is probably at least a paragraph in length.
26+
-->
27+
28+
### Motivation
29+
30+
<!--
31+
This section is for explicitly listing the motivation, goals, and non-goals of
32+
this KEP. Describe why the change is important and the benefits to users.
33+
-->
34+
35+
In istio, [External Name service](https://kubernetes.io/docs/concepts/services-networking/service/#externalname) and DNS resolution typed [ServiceEntry](https://istio.io/latest/docs/reference/config/networking/service-entry/#ServiceEntry-Resolution) are widely used. For both kind of configs, istiod will generate associated DNS typed clusters.
36+
37+
So many people have depend on this kind services, Kmesh have to support it to make people migrate to it seamlessly.
38+
39+
Suppose we create a ServiceEntry like below:
40+
41+
```yaml
42+
apiVersion: networking.istio.io/v1
43+
kind: ServiceEntry
44+
metadata:
45+
name: external-svc-google
46+
namespace: default
47+
spec:
48+
hosts:
49+
- news.google.com
50+
ports:
51+
- number: 80
52+
number: http
53+
protocol: HTTP
54+
resolution: DNS
55+
```
56+
57+
In the previous DNS processing, the CDS refresh process was included, resulting in a high degree of coupling between the DNS module and the kernel-native mode, making it impossible to reuse in dual-engine mode. In Release 1.1, Kmesh refactored the DNS module. Now, the data circulating in the DNS refresh queue is the domain name, rather than a structure containing CDS. Therefore, the DNS module no longer concerns itself with the mode of Kmesh and only provides the hostnames that need to be resolved. Specifically, the DNS logic of the Ads controller was extracted into a separate component called the Ads Dns Controller.
58+
59+
![](./pics/dns-evolution.png)
60+
61+
We want to leverage this capability to build the DNS logic under Dual-Engine mode.
62+
63+
#### Goals
64+
65+
<!--
66+
List the specific goals of the KEP. What is it trying to achieve? How will we
67+
know that this has succeeded?
68+
-->
69+
70+
- Support dns resolution of domain from workload generated by `ServiceEntry` in Dual-engine mode.
71+
- Improve the existing DNS logic and further abstract a more general paradigm.
72+
73+
#### Non-Goals
74+
75+
<!--
76+
What is out of scope for this KEP? Listing non-goals helps to focus discussion
77+
and make progress.
78+
-->
79+
80+
### Proposal
81+
82+
<!--
83+
This is where we get down to the specifics of what the proposal actually is.
84+
This should have enough detail that reviewers can understand exactly what
85+
you're proposing, but should not include things like API designs or
86+
implementation. What is the desired outcome and how do we measure success?.
87+
The "Design Details" section below is for the real
88+
nitty-gritty.
89+
-->
90+
91+
Thanks to the `DNSResolver` in `pkg/dns` which extracted independent DNS resolution logic, we can reuse the DNS capabilities implemented in Dual-engine mode. To be more specific, `dnsController` was implemented (`pkg/controller/ads/dns.go`) and registered as a component of Ads Controller, which receives domains need to be resolved from the Processor (another component of Ads Controller) by a channel `DnsResolverChan`, so the `dnsController` can handle the DNS resolution asynchronously and save the resolved results (address) in a shared cache `AdsCache` to send them back to the Processor, along with flushing them to the BPF map.
92+
93+
### Design Details
94+
95+
<!--
96+
This section should contain enough information that the specifics of your
97+
change are understandable. This may include API specs (though not always
98+
required) or even code snippets. If there's any ambiguity about HOW your
99+
proposal will be implemented, this is the place to discuss them.
100+
-->
101+
102+
Inspired by the AdsDnsController in the Kernel-Native paradigm, we can implement a similar WorkloadDnsController in the Dual-Engine paradigm to handle the DNS resolution for the workloads generated by `ServiceEntry` and without address information. Some key structures and methods are shown below:
103+
104+
```go
105+
type dnsController struct {
106+
workloadsChan chan []*workloadapi.Workload
107+
cache cache.WorkloadCache
108+
dnsResolver *dns.DNSResolver
109+
// store the copy of pendingResolveWorkload.
110+
workloadCache map[string]*pendingResolveDomain
111+
// store all pending hostnames in the workloads
112+
pendingHostnames map[string][]string
113+
sync.RWMutex
114+
}
115+
116+
// pending resolve domain info of Dual-Engine Mode,
117+
// workload is used for create the apiworkload
118+
type pendingResolveDomain struct {
119+
Workloads []*workloadapi.Workload
120+
RefreshRate time.Duration
121+
}
122+
123+
func (r *dnsController) Run(stopCh <-chan struct{}) {
124+
go r.dnsResolver.StartDnsResolver(stopCh)
125+
go r.refreshWorker(stopCh)
126+
go r.processWorkloads()
127+
go func() {
128+
<-stopCh
129+
close(r.workloadsChan)
130+
}()
131+
}
132+
133+
func (r *dnsController) refreshWorker(stop <-chan struct{}) {
134+
for {
135+
select {
136+
case <-stop:
137+
return
138+
case domain := <-r.dnsResolver.DnsChan:
139+
// receive domain need to be resolved and handle it
140+
pendingDomain := r.getWorkloadsByDomain(domain)
141+
addrs := r.dnsResolver.GetDNSAddresses(domain)
142+
// update the cache with resolved addresses
143+
r.updateWorkloads(pendingDomain, domain, addrs)
144+
}
145+
}
146+
}
147+
148+
func (r *dnsController) processWorkloads() {
149+
for workloads := range r.workloadsChan {
150+
r.processDomains(workloads)
151+
}
152+
}
153+
154+
// handle the workloads received from the Processor that need
155+
// DNS resolution, send them to the dnsResolver.
156+
func (r *dnsController) processDomains(workload []*workloadapi.Workload) {
157+
...
158+
}
159+
```
160+
161+
And the Processor need to be updated handle the workloads without address information
162+
163+
```go
164+
func (p *Processor) handleServicesAndWorkloads(services []*workloadapi.Service, workloads []*workloadapi.Workload) {
165+
...
166+
167+
for _, workload := range workloads {
168+
if workload.GetAddresses() == nil {
169+
uid := workload.GetUid()
170+
if !strings.Contains(uid, "ServiceEntry") {
171+
log.Warnf("workload: %s/%s addresses is nil", workload.Namespace, workload.Name)
172+
continue
173+
} else {
174+
// workload from service entry need address resolving
175+
if p.DnsResolverChan != nil {
176+
p.DnsResolverChan <- workloads
177+
}
178+
go func() {
179+
maxRetries := 30
180+
for range maxRetries {
181+
workload := p.WorkloadCache.GetWorkloadByUid(uid)
182+
address := workload.GetAddresses()
183+
if address != nil {
184+
if err := p.handleWorkload(workload); err != nil {
185+
log.Errorf("handle workload %s failed, err: %v", workload.ResourceName(), err)
186+
}
187+
break
188+
} else {
189+
log.Warnf("workload: %s/%s addresses is still nil, retrying...", workload.Namespace, workload.Name)
190+
}
191+
time.Sleep(1 * time.Second)
192+
}
193+
}()
194+
// wait for the service entry to be resolved
195+
}
196+
}
197+
if err := p.handleWorkload(workload); err != nil {
198+
log.Errorf("handle workload %s failed, err: %v", workload.ResourceName(), err)
199+
}
200+
}
201+
}
202+
```
203+
204+
#### Test Plan
205+
206+
<!--
207+
**Note:** *Not required until targeted at a release.*
208+
Consider the following in developing a test plan for this enhancement:
209+
- Will there be e2e and integration tests, in addition to unit tests?
210+
- How will it be tested in isolation vs with other components?
211+
No need to outline all test cases, just the general strategy. Anything
212+
that would count as tricky in the implementation, and anything particularly
213+
challenging to test, should be called out.
214+
-->
215+
216+
### Alternatives
217+
218+
<!--
219+
What other approaches did you consider, and why did you rule them out? These do
220+
not need to be as detailed as the proposal, but should include enough
221+
information to express the idea and why it was not acceptable.
222+
-->
223+
224+
<!--
225+
Note: This is a simplified version of kubernetes enhancement proposal template.
226+
https://github.com/kubernetes/enhancements/tree/3317d4cb548c396a430d1c1ac6625226018adf6a/keps/NNNN-kep-template
227+
-->
266 KB
Loading

0 commit comments

Comments
 (0)