Kubernetes DNS caching gone wrong
Those who have been operating infrastructure know:
Behind every error there is a DNS server
We faced a similar scenario while migrating our applications to Kubernetes. During the testing phase, we saw a flood of fanned out local DNS requests because of kube-dns
based service discovery. Each application request was resulting in multiple DNS queries because of search domains. To tackle this, the first approach was to reduce the ndot
in all the pods and make the use of FQDN
hostnames for microservices communication. Apart from that, we had a requirement to be able to route requests to different nameservers based on domain. So we decided to run a local DNS resolver on each host to mitigate these issues. This was all before the Kubernetes community introduced NodeLocalDNS
so we decided to go ahead with running dnsmasq
on each node providing:
- Local DNS cache.
- Multiple upstream nameservers.
- Custom
TTL
to disable caching by applications or disabling negative cache.
After rolling dnsmasq
on each node in production everything worked out smoothly and metrics for DNS requests went down. With Kubernetes cluster upgrades we migrated to core-dns
and after rolling out 1.6.4
we saw strange spikes in DNS requests. With initial investigations, we realized dnsmasq
stopped caching for some reason. Going through the changelog we realized coredns
removed the RecursiveAvailabe
in the DNS header.
Googling around the topic I realized there is already an open issue on the topic explaining why dnsmasq
does not like to cache such records:
/* Don't put stuff from a truncated packet into the cache.
Don't cache replies from non-recursive nameservers, since we may get a
reply containing a CNAME but not its target, even though the target
does exist. */
if (!(header->hb3 & HB3_TC) &&
!(header->hb4 & HB4_CD) &&
(header->hb4 & HB4_RA) &&
!no_cache_dnssec)
cache_end_insert();
I also saw an open feature request in core-dns
to allow overriding the DNS header flag. As this is an issue with how dnsmasq
handles caching, the core-dns
maintainers recommended proposing a generic way to do it. Having gone through the code base I realized rewrite plugin
seemed a good fit for it. It allows you to rewrite various DNS headers and messages. But doing too much logic in one plugin wasn't great idea so I ended up proposing a new plugin to modify the DNS header.
After a quick review header
plugin was merged and ready to use with v1.8.5 with usage in Corefile
as:
. {
header {
set ra aa
clear rd
}
}
I would like to thank core-dns
maintainers for being supportive of the new proposals especially Miek Gieben for actively helping with reviews and ideas.
Although with NodeLocaDNS
it makes sense to use it for DNS caching but in case you still want to stick with dnsmasq,
header plugin can rescue you. Please give it a try and feel free to propose any new changes to the plugin if you have a custom use case.