|
| 1 | +--- |
| 2 | +title: Support DNS Resolution in Dual Engine Mode |
| 3 | +authors: |
| 4 | +- "@Flying-Tom" |
| 5 | +reviewers: |
| 6 | +- |
| 7 | +approvers: |
| 8 | +- |
| 9 | +creation-date: 2025-07-09 |
| 10 | +--- |
| 11 | + |
| 12 | +## Support DNS Resolution in Dual Engine Mode |
| 13 | + |
| 14 | +<!-- |
| 15 | +This is the title of your KEP. Keep it short, simple, and descriptive. A good |
| 16 | +title can help communicate what the KEP is and should be considered as part of |
| 17 | +any review. |
| 18 | +--> |
| 19 | + |
| 20 | +### Summary |
| 21 | + |
| 22 | +<!-- |
| 23 | +This section is incredibly important for producing high-quality, user-focused |
| 24 | +documentation such as release notes or a development roadmap. |
| 25 | +A good summary is probably at least a paragraph in length. |
| 26 | +--> |
| 27 | + |
| 28 | +### Motivation |
| 29 | + |
| 30 | +<!-- |
| 31 | +This section is for explicitly listing the motivation, goals, and non-goals of |
| 32 | +this KEP. Describe why the change is important and the benefits to users. |
| 33 | +--> |
| 34 | + |
| 35 | +In istio, [External Name service](https://kubernetes.io/docs/concepts/services-networking/service/#externalname) and DNS resolution typed [ServiceEntry](https://istio.io/latest/docs/reference/config/networking/service-entry/#ServiceEntry-Resolution) are widely used. For both kind of configs, istiod will generate associated DNS typed clusters. |
| 36 | + |
| 37 | +So many people have depend on this kind services, Kmesh have to support it to make people migrate to it seamlessly. |
| 38 | + |
| 39 | +Suppose we create a ServiceEntry like below: |
| 40 | + |
| 41 | +```yaml |
| 42 | +apiVersion: networking.istio.io/v1 |
| 43 | +kind: ServiceEntry |
| 44 | +metadata: |
| 45 | + name: external-svc-google |
| 46 | + namespace: default |
| 47 | +spec: |
| 48 | + hosts: |
| 49 | + - news.google.com |
| 50 | + ports: |
| 51 | + - number: 80 |
| 52 | + number: http |
| 53 | + protocol: HTTP |
| 54 | + resolution: DNS |
| 55 | +``` |
| 56 | +
|
| 57 | +In the previous DNS processing, the CDS refresh process was included, resulting in a high degree of coupling between the DNS module and the kernel-native mode, making it impossible to reuse in dual-engine mode. In Release 1.1, Kmesh refactored the DNS module. Now, the data circulating in the DNS refresh queue is the domain name, rather than a structure containing CDS. Therefore, the DNS module no longer concerns itself with the mode of Kmesh and only provides the hostnames that need to be resolved. Specifically, the DNS logic of the Ads controller was extracted into a separate component called the Ads Dns Controller. |
| 58 | +
|
| 59 | + |
| 60 | +
|
| 61 | +We want to leverage this capability to build the DNS logic under Dual-Engine mode. |
| 62 | +
|
| 63 | +#### Goals |
| 64 | +
|
| 65 | +<!-- |
| 66 | +List the specific goals of the KEP. What is it trying to achieve? How will we |
| 67 | +know that this has succeeded? |
| 68 | +--> |
| 69 | +
|
| 70 | +- Support dns resolution of domain from workload generated by `ServiceEntry` in Dual-engine mode. |
| 71 | +- Improve the existing DNS logic and further abstract a more general paradigm. |
| 72 | + |
| 73 | +#### Non-Goals |
| 74 | + |
| 75 | +<!-- |
| 76 | +What is out of scope for this KEP? Listing non-goals helps to focus discussion |
| 77 | +and make progress. |
| 78 | +--> |
| 79 | + |
| 80 | +### Proposal |
| 81 | + |
| 82 | +<!-- |
| 83 | +This is where we get down to the specifics of what the proposal actually is. |
| 84 | +This should have enough detail that reviewers can understand exactly what |
| 85 | +you're proposing, but should not include things like API designs or |
| 86 | +implementation. What is the desired outcome and how do we measure success?. |
| 87 | +The "Design Details" section below is for the real |
| 88 | +nitty-gritty. |
| 89 | +--> |
| 90 | + |
| 91 | +Thanks to the `DNSResolver` in `pkg/dns` which extracted independent DNS resolution logic, we can reuse the DNS capabilities implemented in Dual-engine mode. To be more specific, `dnsController` was implemented (`pkg/controller/ads/dns.go`) and registered as a component of Ads Controller, which receives domains need to be resolved from the Processor (another component of Ads Controller) by a channel `DnsResolverChan`, so the `dnsController` can handle the DNS resolution asynchronously and save the resolved results (address) in a shared cache `AdsCache` to send them back to the Processor, along with flushing them to the BPF map. |
| 92 | + |
| 93 | +### Design Details |
| 94 | + |
| 95 | +<!-- |
| 96 | +This section should contain enough information that the specifics of your |
| 97 | +change are understandable. This may include API specs (though not always |
| 98 | +required) or even code snippets. If there's any ambiguity about HOW your |
| 99 | +proposal will be implemented, this is the place to discuss them. |
| 100 | +--> |
| 101 | + |
| 102 | +Inspired by the AdsDnsController in the Kernel-Native paradigm, we can implement a similar WorkloadDnsController in the Dual-Engine paradigm to handle the DNS resolution for the workloads generated by `ServiceEntry` and without address information. Some key structures and methods are shown below: |
| 103 | + |
| 104 | +```go |
| 105 | +type dnsController struct { |
| 106 | + workloadsChan chan []*workloadapi.Workload |
| 107 | + cache cache.WorkloadCache |
| 108 | + dnsResolver *dns.DNSResolver |
| 109 | + // store the copy of pendingResolveWorkload. |
| 110 | + workloadCache map[string]*pendingResolveDomain |
| 111 | + // store all pending hostnames in the workloads |
| 112 | + pendingHostnames map[string][]string |
| 113 | + sync.RWMutex |
| 114 | +} |
| 115 | +
|
| 116 | +// pending resolve domain info of Dual-Engine Mode, |
| 117 | +// workload is used for create the apiworkload |
| 118 | +type pendingResolveDomain struct { |
| 119 | + Workloads []*workloadapi.Workload |
| 120 | + RefreshRate time.Duration |
| 121 | +} |
| 122 | +
|
| 123 | +func (r *dnsController) Run(stopCh <-chan struct{}) { |
| 124 | + go r.dnsResolver.StartDnsResolver(stopCh) |
| 125 | + go r.refreshWorker(stopCh) |
| 126 | + go r.processWorkloads() |
| 127 | + go func() { |
| 128 | + <-stopCh |
| 129 | + close(r.workloadsChan) |
| 130 | + }() |
| 131 | +} |
| 132 | +
|
| 133 | +func (r *dnsController) refreshWorker(stop <-chan struct{}) { |
| 134 | + for { |
| 135 | + select { |
| 136 | + case <-stop: |
| 137 | + return |
| 138 | + case domain := <-r.dnsResolver.DnsChan: |
| 139 | + // receive domain need to be resolved and handle it |
| 140 | + pendingDomain := r.getWorkloadsByDomain(domain) |
| 141 | + addrs := r.dnsResolver.GetDNSAddresses(domain) |
| 142 | + // update the cache with resolved addresses |
| 143 | + r.updateWorkloads(pendingDomain, domain, addrs) |
| 144 | + } |
| 145 | + } |
| 146 | +} |
| 147 | +
|
| 148 | +func (r *dnsController) processWorkloads() { |
| 149 | + for workloads := range r.workloadsChan { |
| 150 | + r.processDomains(workloads) |
| 151 | + } |
| 152 | +} |
| 153 | +
|
| 154 | +// handle the workloads received from the Processor that need |
| 155 | +// DNS resolution, send them to the dnsResolver. |
| 156 | +func (r *dnsController) processDomains(workload []*workloadapi.Workload) { |
| 157 | + ... |
| 158 | +} |
| 159 | +``` |
| 160 | + |
| 161 | +And the Processor need to be updated handle the workloads without address information |
| 162 | + |
| 163 | +```go |
| 164 | +func (p *Processor) handleServicesAndWorkloads(services []*workloadapi.Service, workloads []*workloadapi.Workload) { |
| 165 | + ... |
| 166 | +
|
| 167 | + for _, workload := range workloads { |
| 168 | + if workload.GetAddresses() == nil { |
| 169 | + uid := workload.GetUid() |
| 170 | + if !strings.Contains(uid, "ServiceEntry") { |
| 171 | + log.Warnf("workload: %s/%s addresses is nil", workload.Namespace, workload.Name) |
| 172 | + continue |
| 173 | + } else { |
| 174 | + // workload from service entry need address resolving |
| 175 | + if p.DnsResolverChan != nil { |
| 176 | + p.DnsResolverChan <- workloads |
| 177 | + } |
| 178 | + go func() { |
| 179 | + maxRetries := 30 |
| 180 | + for range maxRetries { |
| 181 | + workload := p.WorkloadCache.GetWorkloadByUid(uid) |
| 182 | + address := workload.GetAddresses() |
| 183 | + if address != nil { |
| 184 | + if err := p.handleWorkload(workload); err != nil { |
| 185 | + log.Errorf("handle workload %s failed, err: %v", workload.ResourceName(), err) |
| 186 | + } |
| 187 | + break |
| 188 | + } else { |
| 189 | + log.Warnf("workload: %s/%s addresses is still nil, retrying...", workload.Namespace, workload.Name) |
| 190 | + } |
| 191 | + time.Sleep(1 * time.Second) |
| 192 | + } |
| 193 | + }() |
| 194 | + // wait for the service entry to be resolved |
| 195 | + } |
| 196 | + } |
| 197 | + if err := p.handleWorkload(workload); err != nil { |
| 198 | + log.Errorf("handle workload %s failed, err: %v", workload.ResourceName(), err) |
| 199 | + } |
| 200 | + } |
| 201 | +} |
| 202 | +``` |
| 203 | + |
| 204 | +#### Test Plan |
| 205 | + |
| 206 | +<!-- |
| 207 | +**Note:** *Not required until targeted at a release.* |
| 208 | +Consider the following in developing a test plan for this enhancement: |
| 209 | +- Will there be e2e and integration tests, in addition to unit tests? |
| 210 | +- How will it be tested in isolation vs with other components? |
| 211 | +No need to outline all test cases, just the general strategy. Anything |
| 212 | +that would count as tricky in the implementation, and anything particularly |
| 213 | +challenging to test, should be called out. |
| 214 | +--> |
| 215 | + |
| 216 | +### Alternatives |
| 217 | + |
| 218 | +<!-- |
| 219 | +What other approaches did you consider, and why did you rule them out? These do |
| 220 | +not need to be as detailed as the proposal, but should include enough |
| 221 | +information to express the idea and why it was not acceptable. |
| 222 | +--> |
| 223 | + |
| 224 | +<!-- |
| 225 | +Note: This is a simplified version of kubernetes enhancement proposal template. |
| 226 | +https://github.com/kubernetes/enhancements/tree/3317d4cb548c396a430d1c1ac6625226018adf6a/keps/NNNN-kep-template |
| 227 | +--> |
0 commit comments