Scenario:
I have a contact form on my web app, it gets alot of spam.
I am validating the format of email addresses loosely i.e. ^.+@.+..+$
I am using a spam filtering service (defensio) but the spam scores returned are overlapping with valid messages. At a threshold of 0.4 some spam gets through and some customer's questions are wrongly thrown in a log and an error displayed.
All of the spam messages use fake email addresses e.g. [email protected]
Dedicated PHP5 Linux server in US, mysql, logging spam only, emailing the non spam messages (not stored).
Proposal:
Use php's checkdnsrr(preg_replace(/^.+?@/, '', $_POST['email']), 'MX')
to check the email domain resolves to a valid address, log to file, then redirect with an error for messages that don't resolve, proceed to the spam filter service as before for addresses that do resolve according to checkdnsrr()
.
I have read (and i am sceptical about this myself) that you should never leave this type of validation up to remote lookups, but why?
Aside from connectivity issues, where i will have bigger problems than a contact form anyway, is checkdnsrr going to encounter false positives/negatives?
Would there be some address types that wont resolve? gov addresses? ip email addresses?
Do i need to escape the hostname i pass to checkdnsrr()?
Solution:
A combination of all three answers (wish i could accept more than one as a compound answer).
I am using:
$email_domain = preg_replace('/^.+?@/', '', $email).'.';
if(!checkdnsrr($email_domain, 'MX') && !checkdnsrr($email_domain, 'A')){
//validation error
}
All spam is being logged and rotated.
With a view to upgrading to a job queue at a later date.
Some comments were made about asking the mail server for the user to verify, i felt this would be too much traffic and might get my server banned or into trouble in some way, and this is only to cut out most of the emails that were being bounced back due to invalid server addresses.
http://en.wikipedia.org/wiki/Fqdn
and
RFC2821
The lookup first attempts to locate an MX record associated with the name.
If a CNAME record is found instead, the resulting name is processed as if
it were the initial name.
If no MX records are found, but an A RR is found, the A RR is treated as
if it was associated with an implicit MX RR, with a preference of 0,
pointing to that host. If one or more MX RRs are found for a given
name, SMTP systems MUST NOT utilize any A RRs associated with that
name unless they are located using the MX RRs; the "implicit MX" rule
above applies only if there are no MX records present. If MX records
are present, but none of them are usable, this situation MUST be
reported as an error.
Many thanks to all (especially ZoogieZork for the A record fallback tip)
See Question&Answers more detail:
os