Homelab: Internal DNS

Previously, I looked at using Ansible and Ansible's inventory capabilities to begin managing services and configuration on my homelab.

A core defect in the setup I presented was that I hand-coded the mapping of hostnames to IP addresses in my Ansible inventory because well I didn't have DNS set up yet.

But hang on a second. What is DNS and why do I care?

When you type in http://foo.com/bar to your browser, that's a URL (Uniform Resource Locator) which is comprised of a couple segments. It has a scheme - in this case http which describes the protocol by which we'll go and fetch the resource. It also has an authority part - in this case foo.com - a hostname to go and fetch the resource from. The authority part can have other details like a username and port as well. For instance arrdem@foo.com:443 would provide a username, hostname and port. A simple IPv4 or IPv6 address is also legal as a hostname. A URL may also have a path - in this case /bar - which says what to request from foo.com when you get there.

Making a request from an IP address and port is pretty easy - if you know how to speak the protocol. You just make a TCP connection to that (host, port) pair and away you go. But how do you find IP addresses? I don't want to commit ethos (10.0.0.64), logos (10.0.0.65) and pathos (10.0.0.66) to memory or build out anything which really depends on those address assignments if I can avoid it.

Enter DNS - the traditional solution to this problem. DNS ([the] Domain Name System) was created to provide a protocol for mapping names memorable to humans (like ethos, logos and pathos!) to IP addresses which machines actually use. DNS is a host discovery system - its core purpose is to map a domain name to one or more IP addresses presumed to identify machines somewhere. It does not implement service discovery. Services (programs listening to ports on a machine) are identified by convention. For instance "the" program which speaks HTTP if any is listening on port 80, "the" program if any which speaks SSH is listening on port 22 and soforth. These conventions worked fine before the advent of modern shared infrastructure or "cloud" hosting and now pose some challenges I'll talk about later.

So how does DNS work? DNS consists of a hierarchy of servers - known as resolvers - which speak the DNS query language. Each DNS client connects to a few (typically 3 or fewer) resolvers provided as IP addresses. For instance 1.1.1.1 is a DNS resolver made public by CloudFlare, and 8.8.8.8 is a DNS resolver made public by Google. When you make a request of the resolver, you do by requesting an address (called a domain) of the resolver. If the resolver has data it will serve a response, otherwise it may have to (potentially recursively!) inquire of other resolvers for the data you wanted.

What kind of record(s) live in DNS? The most basic record is a A record - just an IP address. We can search DNS for records using the dig tool, as such -

$ dig www.arrdem.com

; <<>> DiG 9.14.3 <<>> www.arrdem.com A
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 17422
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;www.arrdem.com.            IN  A

;; ANSWER SECTION:
www.arrdem.com.     300 IN  A   67.166.32.93

;; Query time: 66 msec
;; SERVER: 75.75.75.75#53(75.75.75.75)
;; WHEN: Fri Jul 05 17:32:05 PDT 2019
;; MSG SIZE  rcvd: 59

In this response we can see the ANSWER section, which says cryptically

www.arrdem.com. 300 IN A 67.166.32.93

The first element here - www.arrdem.com. is the full canonical name of the requested record. The second element - 300 - is the TTL of this record in seconds. This tells resolvers which have to recursively query to get this data how long they may cache it for. The third element - IN A denotes the record type. Finally we actually have the value - 67.166.32.93 being the current IP address for my homelab.

An interesting property of DNS is that most records need not be singular. That is, you could dig and get a couple IP addresses back.

Twitter for instance presents two public IPs.

$ dig twitter.com A

; <<>> DiG 9.14.3 <<>> twitter.com A
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 11939
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;twitter.com.           IN  A

;; ANSWER SECTION:
twitter.com.        563 IN  A   104.244.42.65
twitter.com.        563 IN  A   104.244.42.1

;; Query time: 32 msec
;; SERVER: 75.75.75.75#53(75.75.75.75)
;; WHEN: Fri Jul 05 17:34:08 PDT 2019
;; MSG SIZE  rcvd: 72

That is there is not one but two public addresses either of which could be used to access the service known by the domain name twitter.com if the other fails or is overloaded. So if you go and connect to http://twitter.com, you'll be connecting to one of those two IP addresses. This can be used to build client-side load balancing to distribute requests randomly over many hosts as clients are expected to choose which host to connect to in "round robin" order. For instance a fleet of tens or more puppet servers all of which provide the same data could live behind a single A record "round robin".

There's a lot of really interesting stuff you can do with DNS, but for now lets get it up and running in the lab.

The obvious first step would be to reconfigure my router to push the IP addresses of my three nodes as DNS resolvers. Doing so before the resolver(s) are set up however would nuke my ability to talk to the outside world! (looking at you stackoverflow) so lets hold off on that.

Instead we'll take advantage of the dig tool's ability to target a specific resolver eg. dig <address> @<resolver> to test the resolvers I'm building out before we cut over to them.

Okay. Let's do this.

BIND setup

There's a number of DNS servers - but I'm gonna go with good old bind. Bind (aka named) uses a three part configuration. /etc/named.conf tells named what to do - for which the general pattern is include configurations for domains (called zones) out of /etc/named/data/.conf. While bind can do a lot of stuff, all I'm gonna use it for initially is to serve handwritten domain files (AKA zonefiles) out of /etc/named/master/.

Writing this Ansible role is pretty easy -

roles/dns-resolver/tasks/main.yml

---
- name: Install bind
  package:
    name: bind
    state: present
  notify:
    - named enable

- name: Create directories
  when: installed.changed
  file:
    path: "{{ item }}"
    state: directory
    owner: root
    group: root
  with_items:
    - /etc/named/data
    - /etc/named/master

- name: Deploy named.conf
  when: installed.changed
  template:
    src: named.conf.j2
    dest: /etc/named.conf

Of slightly more interest is the actual named config template I'm deploying -

roles/dns-resolver/templates/named.conf.j2

acl "subnet" {
  10.0.0.0/24;
  localhost;
  localnets;
};

options {
  directory "/var/named";
  pid-file "/run/named/named.pid";

  listen-on { any; };

  allow-recursion { subnet; localhost; };
  allow-query { subnet; localhost; };
  allow-query-cache { subnet; localhost; };

  forwarders {
{% for node in upstream_dns_resolvers %}
    {{ node }};
{% endfor %}
  };
};

zone "localhost" IN {
  type master;
  file "localhost.zone";
};

This configuration defines an Access Control List (ACL) for my local subnet. It then allows only hosts in the subnet - or the local host - to make queries of this server. We also set up forwarders - hosts which each bind instance will query if the bind instance doesn't have master data. Elsewhere in Ansible variables, I'm defining

# Everywhere we use the same upstream DNS resolvers.
# Local DNS resolvers are configured per-geo as 
dns_resolvers_upstream:
  - 1.1.1.1
  - 8.8.8.8
  - 8.8.4.4

Let's create a new Ansible inventory group for the sake of hygiene which will contain our resolvers.

hosts

---
apartment_modes:
  hosts:
    ethos.apartment.arrdem.com:
      vars:
        ansible_host: 10.0.0.64

    logos.apartment.arrdem.com:
      vars:
        ansible_host: 10.0.0.65

    pathos.apartment.arrdem.com:
      vars:
        ansible_host: 10.0.0.66

apartment_resolvers:
  children:
    apartment_modes:

With just this configuration, we can run it against my modes using a really simple playbook

play.yml

---
- hosts:
    - apartment_resolvers
  roles:
    - role: dns-resolver

And run that -

$ ansible-playbook -i hosts play.yml

PLAY [apartment_resolvers] **************************************************************************************

TASK [Gathering Facts] ******************************************************************************************
ok: [ethos.apartment.arrdem.com]
ok: [pathos.apartment.arrdem.com]
ok: [logos.apartment.arrdem.com]

TASK [dns-resolver : Install bind] ******************************************************************************
ok: [ethos.apartment.arrdem.com]
ok: [pathos.apartment.arrdem.com]
ok: [logos.apartment.arrdem.com]

TASK [dns-resolver : Create directories] ************************************************************************
ok: [ethos.apartment.arrdem.com] => (item=/etc/named/data)
ok: [pathos.apartment.arrdem.com] => (item=/etc/named/data)
ok: [logos.apartment.arrdem.com] => (item=/etc/named/data)
ok: [pathos.apartment.arrdem.com] => (item=/etc/named/master)
ok: [logos.apartment.arrdem.com] => (item=/etc/named/master)
ok: [ethos.apartment.arrdem.com] => (item=/etc/named/master)

TASK [dns-resolver : Deploy named.service] **********************************************************************
ok: [ethos.apartment.arrdem.com]
ok: [logos.apartment.arrdem.com]
ok: [pathos.apartment.arrdem.com]

TASK [dns-resolver : Deploy named.conf] *************************************************************************
ok: [logos.apartment.arrdem.com]
ok: [ethos.apartment.arrdem.com]
ok: [pathos.apartment.arrdem.com]

PLAY RECAP ******************************************************************************************************
ethos.apartment.arrdem.com : ok=5    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
logos.apartment.arrdem.com : ok=5    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
pathos.apartment.arrdem.com : ok=5    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

Cool! So now we should be able to run some test DNS queries against these servers. Most important is my ability to do recursive queries, so lets check twitter.com first.

$ dig twitter.com @10.0.0.64

; <<>> DiG 9.14.3 <<>> twitter.com @10.0.0.64
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 27850
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: 0f82e839818e3b9cecebcec75d200ee12a4d47ef5312925b (good)
;; QUESTION SECTION:
;twitter.com.           IN  A

;; ANSWER SECTION:
twitter.com.        1022    IN  A   104.244.42.193
twitter.com.        1022    IN  A   104.244.42.129

;; Query time: 5 msec
;; SERVER: 10.0.0.64#53(10.0.0.64)
;; WHEN: Fri Jul 05 20:00:45 PDT 2019
;; MSG SIZE  rcvd: 100

Heck yeah. Recursive lookups are working.

A first zone

Now let's do what we're here to do - creating the apartment.arrdem.com zone. To keep things simple, I'm gonna handwrite my first zonefile.

roles/dns-zone/templates/apartment.arrdem.com.j2

$ORIGIN apartment.arrdem.com.
$TTL 7200
apartment.arrdem.com. IN SOA ns.apartment.arrdem.com. mail.apartment.arrdem.com. (
    2019070442
    43200
    180
    1209600
    10800
)
;;; NS  section
@ NS ns.apartment.arrdem.com.
ns IN A 10.0.0.65
ns IN A 10.0.0.66
ns IN A 10.0.0.64

;;; HOSTS
ethos  IN A 10.0.0.65
logos  IN A 10.0.0.66
pathos IN A 10.0.0.64

The ns record is a convention for all the nameservers (resolves) in the domain. And I've got an A record for each of my currently three machines.

We'll also need a small template to configure named for each zone -

roles/dns-zone/templates/zone-data.j2

zone "{{ item }}" {
  type master;
  file "/etc/named/master/{{ item }}";
  allow-transfer {none;};
  allow-update {none;};
};

This config just tells named to prohibit dynamic updates or transfers of the domain. We've already set global ACLs for querying. As a template, it presumes we're rendering it from inside a loop over zone names.

All it takes to get this deployed is a pretty simple role -

roles/dns-zone/tasks/main.yml

---
- name: Deploy zonefiles
  with_items: "{{ zones }}"
  template:
    src: "{{ item }}.zone"
    dest: "/etc/named/master/{{ item }}"
  notify:
    - named reload

- name: Deploy zone data
  with_items: "{{ zones }}"
  template:
    src: zone-data.j2
    dest: "/etc/named/data/{{ item }}.conf"

- name: Add zone config
  with_items: "{{ zones }}"
  lineinfile:
    path: /etc/named.conf
    state: present
    line: "include \"/etc/named/data/{{ item }}.conf\";"

That is, we'll apply this role with a list of zones as the variable zones, for each one rendering a template to produce the zonefile, rendering our config template for each zone and using the lineinfile module to monkeypatch our main /etc/named.conf to make named include the new zone's config.

Patching our playbook a tiny bit -

play.yml

---
- hosts:
    - apartment_resolvers
  vars_files:
    - "vars/.yml"
  roles:
    - role: dns-resolver
    - role: dns-zone
      zones:
        - apartment.arrdem.com

And running it -

$ ansible-playbook -i hosts play.yml

PLAY [apartment_resolvers] **************************************************************************************

TASK [Gathering Facts] ******************************************************************************************
ok: [logos.apartment.arrdem.com]
ok: [pathos.apartment.arrdem.com]
ok: [ethos.apartment.arrdem.com]

TASK [dns-resolver : Install bind] ******************************************************************************
ok: [pathos.apartment.arrdem.com]
ok: [logos.apartment.arrdem.com]
ok: [ethos.apartment.arrdem.com]

TASK [dns-resolver : Create directories] ************************************************************************
ok: [pathos.apartment.arrdem.com] => (item=/etc/named/data)
ok: [ethos.apartment.arrdem.com] => (item=/etc/named/data)
ok: [logos.apartment.arrdem.com] => (item=/etc/named/data)
ok: [pathos.apartment.arrdem.com] => (item=/etc/named/master)
ok: [ethos.apartment.arrdem.com] => (item=/etc/named/master)
ok: [logos.apartment.arrdem.com] => (item=/etc/named/master)

TASK [dns-resolver : Deploy named.service] **********************************************************************
ok: [ethos.apartment.arrdem.com]
ok: [logos.apartment.arrdem.com]
ok: [pathos.apartment.arrdem.com]

TASK [dns-resolver : Deploy named.conf] *************************************************************************
ok: [logos.apartment.arrdem.com]
ok: [pathos.apartment.arrdem.com]
ok: [ethos.apartment.arrdem.com]

TASK [dns-zone : Deploy zonefiles] ******************************************************************************
changed: [pathos.apartment.arrdem.com] => (item=apartment.arrdem.com)
changed: [logos.apartment.arrdem.com] => (item=apartment.arrdem.com)
changed: [ethos.apartment.arrdem.com] => (item=apartment.arrdem.com)

TASK [dns-zone : Deploy zone data] ******************************************************************************
ok: [ethos.apartment.arrdem.com] => (item=apartment.arrdem.com)
ok: [logos.apartment.arrdem.com] => (item=apartment.arrdem.com)
ok: [pathos.apartment.arrdem.com] => (item=apartment.arrdem.com)

TASK [dns-zone : Add zone config] *******************************************************************************
changed: [ethos.apartment.arrdem.com] => (item=apartment.arrdem.com)
changed: [pathos.apartment.arrdem.com] => (item=apartment.arrdem.com)
changed: [logos.apartment.arrdem.com] => (item=apartment.arrdem.com)

RUNNING HANDLER [dns-zone : named reload] ***********************************************************************
changed: [pathos.apartment.arrdem.com]
changed: [ethos.apartment.arrdem.com]
changed: [logos.apartment.arrdem.com]

PLAY RECAP ******************************************************************************************************
ethos.apartment.arrdem.com : ok=9    changed=3    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
logos.apartment.arrdem.com : ok=9    changed=3    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
pathos.apartment.arrdem.com : ok=9    changed=3    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

we should be able to dig ethos, logos and pathos out of DNS!

$ for h in ethos logos pathos; do dig +short ${h}.apartment.arrdem.com @10.0.0.64; done
10.0.0.65
10.0.0.66
10.0.0.64

Heck yeah.

Now if I go into my router, tell it to use my three nodes as DNS resolvers and reconnect my device so that it get a fresh resolver config, I'll see my resolvers configured in /etc/resolv.conf

# Generated by resolvconf
domain apartment.arrdem.com
search apartment.arrdem.com arrdem.com
nameserver 10.0.0.64
nameserver 10.0.0.65
nameserver 10.0.0.66

Now, I can ssh using DNS names not IP addresses!

$ ssh arrdem@pathos echo '$(hostname -f)] Hello, world!'
pathos.apartment.arrdem.com] Hello, world!

Metaprogramming zones

While the above zonefile for apartment.arrdem.com strictly works - it's also one more thing to update by hand whenever I bring up a new node or service. I'm gonna be spending a lot of quality time working on the service discovery problem - but let's start with this. Ansible still has (as ansible_host) the IP address for every device I configure. So at the very least, one could write this zone -

roles/dns-zone/templates/apartment.arrdem.com.j2

$ORIGIN apartment.arrdem.com.
$TTL 7200
apartment.arrdem.com. IN SOA ns.apartment.arrdem.com. mail.apartment.arrdem.com. (
    {{ansible_date_time.year}}{{ansible_date_time.month}}{{ansible_date_time.day}}42
    43200
    180
    1209600
    10800
)
;;; NS  section
@ NS ns.apartment.arrdem.com.
{% for node in groups[geo + '_resolvers'] %}
ns IN A {{ hostvars[node]['ansible_host'] }}
{% endfor %}

;;; HOSTS
{% for node in groups['geo_apartment'] %}
{{ node | shortname | format("{0: <16}") }} IN A {{ hostvars[node]['ansible_host'] }}
{% endfor %}

This template will generate a SOA version by concatenating the date to day precision, along with a counter I bump by hand. Leveraging the fact that there's an apartment_resolvers group in Ansible's inventory, we can introspect it if there's a geo variable set. We can also play the same game to get all the hosts in the geo_apartment group!

So if I tweak my inventory a tiny bit -

hosts

---
apartment_modes:
  hosts:
    ethos.apartment.arrdem.com:
      vars:
        ansible_host: 10.0.0.64

    logos.apartment.arrdem.com:
      vars:
        ansible_host: 10.0.0.65

    pathos.apartment.arrdem.com:
      vars:
        ansible_host: 10.0.0.66

apartment_resolvers:
  children:
    apartment_modes:

geo_apartment:
  vars:
    geo: apartment
  children:
    apartment_modes:

Now if I want to add a half-dozen raspberry pis all of a sudden, all I have to do is add them to my Ansible inventory and they'll automatically be added to DNS! To really see that this works, check out ansible-inventory -i hosts --list with this hosts file.

^d