Weirdly Long DNS Switchover?

post by jefftk (jkaufman) · 2021-02-16T20:00:05.905Z · LW · GW · 1 comments

[EDIT: as Jai points out in the FB comments, the problem was likely forgetting IPv6. The AAAA record is now updated, and I should be able to tell in a few hours whether that fixed it.]

I'm helping my dad migrate some code to a new server. It was at for years, but ten days ago he changed his DNS settings to point to Everything looks good to me:

$ dig 1800
    IN    A
The TTL is 1800s, or 30min, which agrees with I expected everyone would be moved over within a couple hours, but a week later the old server is still receiving traffic nearly as much as the new:

date old server new server
2021-02-05 943 0
2021-02-06 201 127
2021-02-07 17 108
2021-02-08 364 423
2021-02-09 488 448
2021-02-10 255 503
2021-02-11 281 345
2021-02-12 250 248
2021-02-13 0 88
2021-02-14 0 78
2021-02-15 217 262
2021-02-16 202 287

The old server getting no traffic on the 13th and 14th is probably because that's the weekend, and the users who happen to be still stuck on the old site aren't using it on the weekend. I asked one of the users still getting the old server to try rebooting, to no effect.

I thought maybe something was misconfigured with the name servers, but it looks fine:

$ whois \
   | grep 'Name Server'
Name Server: NS1.ZEROLAG.COM
Name Server: NS2.ZEROLAG.COM

$ dig \ 1800
   IN    A

$ dig \ 1800
   IN    A
I'm not seeing any references to the old IP address anywhere.

Any guesses about why the traffic isn't moving over?


Comments sorted by top scores.

comment by jbash · 2021-02-17T15:27:26.132Z · LW(p) · GW(p)

Your A records are fine, but you seem to have changed name servers. Your old NS records are probably cached all over the place; the TTL on those seems to be 48 hours. It looks like the old server (at is serving the correct data now, but it was probably still serving stale data when you saw the problem. Possibly it took it a while to realize that it wasn't authoritative for the zone, or possibly there was an update problem.

Generally, it's better to do the server change and the data change separately. And you have to make sure that the new and old servers are serving the same thing through the full TTL of the old NS records, or at least have the old server definitively reconfigured not to see itself as authoritative so that it can avoid misleading other systems when it gets a query.

Evidence: If I let my system use my ISP's servers and do dig -t ns, I get the wrong cached data:

   ; <<>> DiG 9.11.27-RedHat-9.11.27-1.fc33 <<>> -t ns
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 38501
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1

; EDNS: version: 0, flags:; udp: 1232
; COOKIE: b250042072aca8b7ccde8a21602d31062c20fb70fa6f7d57 (good)
;          IN      NS

;; ANSWER SECTION:   3600    IN      NS   3600    IN      NS

;; Query time: 168 msec
;; WHEN: Wed Feb 17 10:06:46 EST 2021
;; MSG SIZE  rcvd: 122

... but if I go and query the actual GTLD servers, say with dig -t ns, I get the right data:

; <<>> DiG 9.11.27-RedHat-9.11.27-1.fc33 <<>> -t ns
; (2 servers found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 27939
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 2, ADDITIONAL: 3
;; WARNING: recursion requested but not available

; EDNS: version: 0, flags:; udp: 4096
;          IN      NS

;; AUTHORITY SECTION:   172800  IN      NS   172800  IN      NS