Kubernetes DNS caching gone wrong

Qasim Sarfraz
3 min readApr 29, 2022

Those who have been operating infrastructure know:

Behind every error there is a DNS server

We faced a similar scenario while migrating our applications to Kubernetes. During the testing phase, we saw a flood of fanned out local DNS requests because of kube-dns based service discovery. Each application request was resulting in multiple DNS queries because of search domains. To tackle this, the first approach was to reduce the ndot in all the pods and make the use of FQDN hostnames for microservices communication. Apart from that, we had a requirement to be able to route requests to different nameservers based on domain. So we decided to run a local DNS resolver on each host to mitigate these issues. This was all before the Kubernetes community introduced NodeLocalDNS so we decided to go ahead with running dnsmasq on each node providing:

  • Local DNS cache.
  • Multiple upstream nameservers.
  • Custom TTL to disable caching by applications or disabling negative cache.

After rolling dnsmasq on each node in production everything worked out smoothly and metrics for DNS requests went down. With Kubernetes cluster upgrades we migrated to core-dns and after rolling out 1.6.4 we saw strange spikes in DNS requests. With initial investigations, we realized dnsmasq stopped caching for some reason. Going through the changelog we realized coredns removed the RecursiveAvailabe in the DNS header.

Googling around the topic I realized there is already an open issue on the topic explaining why dnsmasq does not like to cache such records:

/* Don't put stuff from a truncated packet into the cache.
Don't cache replies from non-recursive nameservers, since we may get a
reply containing a CNAME but not its target, even though the target
does exist. */
if (!(header->hb3 & HB3_TC) &&
!(header->hb4 & HB4_CD) &&
(header->hb4 & HB4_RA) &&
!no_cache_dnssec)
cache_end_insert();

I also saw an open feature request in core-dns to allow overriding the DNS header flag. As this is an issue with how dnsmasq handles caching, the core-dns maintainers recommended proposing a generic way to do it. Having gone through the code base I realized rewrite plugin seemed a good fit for it. It allows you to rewrite various DNS headers and messages. But doing too much logic in one plugin wasn't great idea so I ended up proposing a new plugin to modify the DNS header.

After a quick review header plugin was merged and ready to use with v1.8.5 with usage in Corefile as:

. {
header {
set ra aa
clear rd
}
}

I would like to thank core-dns maintainers for being supportive of the new proposals especially Miek Gieben for actively helping with reviews and ideas.

Although with NodeLocaDNS it makes sense to use it for DNS caching but in case you still want to stick with dnsmasq, header plugin can rescue you. Please give it a try and feel free to propose any new changes to the plugin if you have a custom use case.

--

--