We had a problem last weekend with Postfix not accepting email for a single domain when it was coming from outside our network, while messages from hosts on the local network were accepted and routed with no problems. Messages from outside the network were rejected with a 450 (temporary) code and the error message “Recipient address rejected: Domain not found”. The cause did end up being a DNS problem (apparently the most common kind of issue with Postfix), but not one that I would have expected (a missing host entry for the top-level domain, so example.com wouldn’t resolve even though mail.example.com did). Finding the source of the problem was complicated because of a set of several changes during a weekend maintenance window.
We’re using a bundled virtual machine called ESVA for spam filtering (currently not available, see ESVA Website (global-domination.org) Down, 2010-March) ; it’s basically a prebuilt CentOS server preconfigured with Postfix, MailScanner, SQLGrey (for greylisting), MailWatch, etc.. It’s worked quite well for us for several years – on our site for filtering our inbound email and that for our hosted customers, and at several customer sites where they’re running Exchange. I highly recommend it as a drop-in spam filtering solution with good reporting and manageability, plus a fairly active user community that can generally answer any questions they haven’t already answered.
Our internal ESVA server was slightly out of date, so last weekend was a good time to update it to 18.104.22.168 (a simple process using ‘esva-update‘) and update a few packages such as ClamAV (‘yum update‘). I ran the updates, verified that email messages were coming in correctly, and figured I was done with it. I also updated the nameserver entries in /etc/resolv.conf to point to our new internal DNS servers.
A brief digression on our network: We have external-facing DNS servers for domains that we host, but we also have internal servers that return internal addresses for some systems. Those internal servers report as authoritative because they are for the internal hosts that they serve. Thus from outside mail.example.com may resolve to 209.252.x.y but from inside it resolves to 10.3.4.5.
After the update, message traffic was fine – messages were coming in and being passed along to the mail servers, obviously there were no problems. By Monday morning, though it became apparent that while our customers were receiving outside email, we were not. This was easy to miss initially because we’re a small company and quite frankly we’re not getting a lot of message traffic overnight on weekends. A quick test from inside the network (‘telnet mail2.example.com 25‘ and manually entering a message) worked just fine so it wasn’t a problem with the actual handling of email messages, email for other domains that we host was being accepted and routed appropriately so it wasn’t a problem with firewall configuration, etc.
After spending more time than I like to think about digging through postfix trying to determine why it would be rejecting messages for just one domain (and the primary domain for which ESVA was set up), the problem ended up being completely different:
The new DNS servers didn’t have a host entry for the base domain example.com. All of the relevant subdomains were present (some with internal addresses, some with external ones depending on how they’re reached – they all have to be there since the DNS server thinks it’s the authoritative one for example.com and won’t forward queries for that domain. Thus example.com didn’t resolve (example.local did, which was what the domain was based on). Individual hosts had been set up in DNS (e.g. www.example.com, mail.example.com, mail2.example.com) but not a base record for the top-level domain itself.
Postfix accepted messages submitted from the internal network because it was a local network, and once it had those messages it had transport for the domain name so testing from an internal host worked. Messages from outside the local network were being checked to confirm that the top-level domain existed, but since it didn’t resolve from inside the network messages were being rejected with
Mar 1 15:12:36 mail2 postfix/smtpd: NOQUEUE: reject: RCPT from mta.sample.net[22.214.171.124]: 450 4.1.2 <firstname.lastname@example.org>: Recipient address rejected: Domain not found; from=<bounce-1234567_HTMLemail@example.com> to=<firstname.lastname@example.org> proto=ESMTP helo=<mta.example.net>
The fix was simple – correct the internal DNS server to allow example.com to actually resolve to a hostname. It didn’t really matter what it was (we used the IP of the web server which is pretty standard), just that it resolved to something so Postfix wouldn’t reject it.
While I feel a bit foolish for having this bit of misconfiguration happen, in searching for others who’d experienced similar problems I found nothing particularly useful except comments (generally for the error “Sender address rejected; Domain not found”) that it was probably a DNS error. Hopefully this will help someone with the same problem in the future.[contact-form-7 404 "Not Found"]