Back to blog

Deliverability

Deliverability incidents every agency should monitor

Not every infrastructure issue is equally urgent. Here's a practical breakdown of the deliverability incidents that matter most, ordered by how quickly they need a response.

Infraova TeamJan 27, 20265 min read

Not every change to a client's email infrastructure is an emergency. A DNS record that's been stable for two years suddenly changing is very different from a blacklist status that fluctuates occasionally on a shared IP range. Treating every alert as equally urgent leads to alert fatigue and alert fatigue leads to real incidents getting buried in noise.

This is a practical breakdown of the incidents worth monitoring, grouped by how urgently they typically need attention.

Respond within hours

These incidents actively block or degrade mail delivery the moment they occur. The longer they go unnoticed, the more campaigns are affected.

DKIM signature failures. If a sending platform's DKIM key rotates and the new public key isn't in DNS yet, every email sent during that gap fails DKIM validation. Combined with a strict DMARC policy, this can mean outright rejection. Combined with a relaxed policy, it means degraded deliverability that's hard to attribute to anything specific.

New blacklist listings. A fresh listing on a major blacklist (Spamhaus, Barracuda, SpamCop, and similar) can immediately affect a meaningful percentage of sends to providers that check that list. The earlier this is caught, the sooner a delisting request can go in, and the shorter the reputation recovery period afterward.

SPF record changes that break alignment. If an SPF record gets overwritten common during DNS migrations when a new provider's "starter" DNS template replaces existing records every legitimate sending source not in the new record starts failing SPF checks.

DMARC policy changes to reject without warning. If a domain's DMARC policy is suddenly tightened to p=reject (sometimes by an IT team trying to "improve security" without coordinating with whoever manages marketing email), any sending source not perfectly aligned with SPF/DKIM gets blocked entirely, not just flagged.

Respond within a day

These represent meaningful risk but aren't actively blocking mail right now they're indicators that something is likely to become a same-day issue soon, or that mail is being degraded rather than blocked.

Rising bounce rates. A gradual increase in bounce rate (say, from 1% to 4% over a couple of weeks) often precedes a blacklist listing, since high bounce rates are one of the signals blacklist operators use. Catching the trend early can prevent the listing altogether.

MX record changes. If MX records change intentionally during a provider migration, or unintentionally due to a DNS error incoming mail (including bounce notifications, replies, and authentication-related emails) may stop being delivered to the right place. This doesn't block outbound sending immediately, but it breaks the feedback loop that would otherwise surface other problems.

New DNS records appearing on subdomains. A new subdomain suddenly gaining SPF/MX records often indicates a new tool was connected which is fine, but if it wasn't communicated to whoever manages the domain's overall authentication posture, it's worth a quick check to make sure it's legitimate and properly configured.

DMARC policy moving from none to quarantine or reject. Unlike the same-day scenario above (an unexpected jump straight to reject), a gradual tightening is often intentional and good practice but it should be verified against current SPF/DKIM alignment for all senders before it takes effect, to avoid surprises.

Worth tracking, respond within a week

These are slower-moving signals useful for spotting trends and informing strategy, but rarely require same-day action on their own.

Domain health score trends. If a domain's overall health score (an aggregate of authentication status, blacklist history, and sending consistency) is gradually declining over several weeks, it's worth understanding why, even if no single check has failed outright.

WHOIS / domain expiration changes. Domain registration expiry dates approaching, or registrar/nameserver changes, are good to know about an expired domain stops everything, but this is typically a slow-moving, predictable event with plenty of lead time if tracked.

Historical blacklist patterns. A domain that's been listed and delisted multiple times over several months even if currently clean has an underlying issue (likely list hygiene or a shared IP problem) that's worth investigating proactively rather than waiting for the next listing.

Sending volume drift. Significant, sustained changes in sending volume compared to historical baselines much higher or much lower can indicate anything from a new campaign type to an account compromise, and is worth periodic review even if no specific check fails.

Why the tiering matters

If every one of these triggers an identical "ALERT: domain issue detected" notification, the team quickly learns to treat all alerts the same way which usually means triaging them all at the same (often low) priority, or eventually tuning them out.

A DKIM failure that's actively blocking mail and a domain health score that dipped two points over a month are not the same kind of event, and shouldn't compete for the same attention. Structuring monitoring around urgency tiers rather than a flat list of "things that could be wrong" means the team's fastest response goes to the things that actually need it, and the slower-moving signals get reviewed on a cadence that makes sense without creating noise.

The goal isn't to monitor everything equally. It's to monitor everything appropriately so that when something genuinely urgent happens, it doesn't get lost in a feed of changes that could have waited until next week's review.

deliverabilitymonitoringincident response