I figured that several readers are also bloggers in their own right, and might be interested in some information that I’ve gathered about spam and my efforts to block it.
This blog, which is not a terribly popular one, gets a substantial amount of comment spam. For example, here’s the amount of spam that was received for the last few months:
Dec2010: 5,028
Jan2011: 6,544
Feb2011: 4,712
Mar2011: 5,596
Compare that to the 25-30 legitimate comments made monthly, and you see that the ratio is extremely skewed in favor of spam. Since this blog was founded in 2008, 53,881 spams have been received, compared to 854 total legitimate messages.
Ideally, there would be no comment spam. Since this is not possible, I want to reduce spam by the maximum amount possible, inconvenience users as little as possible, and keep the spam queue in the WordPress administrative interface as empty as I can.
Now, WordPress comes with an outstanding spam filter called Akismet. When activated, all incoming comments are sent to Akismet for a spam/not-spam review. Since the service is centralized, they’re able to accumulate a huge amount of data about spammy and legitimate messages, adapt to changing spam patterns, and do remarkably well (99.96% according to my calculations) at detecting spam and allowing legitimate messages to pass. If it misses spam, or mistakenly flags legitimate mail as spam, I can override the Akismet decision (and that override is sent to Akismet so it can adapt).
Messages flagged as spam by Akismet go into the spam queue for my review. Unfortunately, this means that more than 150 spams a day get shunted there. Reviewing these messages is tedious and time-consuming. What if I could block the spam from even being submitted, thus reducing the amount of spam that I need to wade through?
Since all WordPress blogs have the same comments.php file, spammers don’t even need to fill in the normal comments form on the website: they can submit their spam directly to the comments.php file with the appropriate fields already filled in. Of course, since this is all done automatically by software, a slight change to the comments.php file will result in the spambots being unable to submit messages. Enter NoSpamNX, a very handy plugin that makes these changes that breaks spambots but doesn’t affect humans. Specifically, it adds certain fields to the human-readable contact form that are filled in with a randomly-generated bunch of text (to avoid the spammers adapting, it changes these random values every 24 hours).
If a comment does not include these hidden fields with that day’s random text, that means that the comment was not submitted through the ordinary human-readable form, and therefore must be spam. One can elect to then mark the message as spam, or simply delete it outright.
This simple plugin has blocked 37,775 spams since I installed it in June 2010. During that same period, a total of 39,113 spams were submitted to my site. This means that NoSpamNX alone would have blocked about 96.6% of spam. Not bad, particularly for something that does not burden legitimate commenters with any additional steps like CAPTCHAs.
In my particular case, I like contributing spam messages to Akismet since it improves their statistics, so I elected to have NoSpamNX simply mark messages as spam rather than deleting them (the deletion would occur before the messages get submitted to Akismet). Thus, my spam queue had lots of messages for me to review. I needed something more, something that would provide a second opinion to Akismet and NoSpamNX.
In my December 14th post, I mentioned that I was testing out a plugin called Conditional CAPTCHA. This one is particularly useful: it waits for messages to get reviewed by existing spam filters such as Akismet. If Akismet says the message is legitimate, Conditional CAPTCHA does nothing, and the message is posted immediately. However, if the message is flagged as spam, then Conditional CAPTCHA presents a reCAPTCHA. If the CAPTCHA is solved incorrectly or no attempt to solve it is made within 10 minutes, the message is silently deleted and not added to the spam queue. If the CAPTCHA is solved correctly, the message is then placed into the moderation queue (I’m a bit suspicious, as it was marked as spam, so I want to review it prior to it being posted).
Using Conditional CAPTCHA means that the vast majority of legitimate commenters are not inconvenienced by always facing a CAPTCHA. Only comments flagged as spam are presented with such a challenge.
So far, Conditional CAPTCHA has stopped 18,589 spams since it was installed, essentially 100% of the spam submitted to this site. There have been exactly four messages that were flagged as spam and resulted in the CAPTCHA being solved correctly. All of these have been spam, and never made it out of the moderation queue.
In my particular case, NoSpamNX is a bit redundant: I use it simply to keep a measure of how many spammers submit spam directly to the comments.php file versus how many submit comments using the human-readable form.
In conclusion, if you are a WordPress blogger and are inundated with spam, both on your site and in your spam queue, I heartily recommend using both Akismet (which you should already be using) and Conditional CAPTCHA. Doing so should reduce your spam to practically nothing.
If other bloggers out there have some statistics on the spam they receive, what they use to combat it, and how effective those measures are, I would be quite interested in hearing about it.
2 thoughts on “Followup on Spam Filtering”
Comments are closed.
I decided to write a comment to see the Nospamnx in action, but I dont see where I would need to enter any additional information. The fields are the same as any other blog. Am I missing something?
thanks for the post.
Allen: Indeed, you are missing something. Perhaps I can clarify things: the fields that NoSpamNX adds are hidden from human view. It’s designed to be completely transparent to individual users (that is, there aren’t any additional fields or forms to fill out). These hidden fields are required for a post to be accepted, and are only included in the normal human-readable submission form (though they remain hidden from view).
Since most spambots post directly to the comments.php page, rather than use the ordinary form for humans, they don’t include these hidden fields in their comments, and so their spammy messages are then rejected.