Dave, the author of Spam Karma 2 (SK2), has written an interesting essay on his take on comment spam entitled The State of Spam [Karma]. It’s a good read if you’re into this sort of thing. SK2 was one of the first really advanced scripts put together to combat spam, and you can see on our development page someone has actually created code that combines Akismet with SK2, which sounds pretty groovy to us.
Since Akismet is mentioned specifically, I would like to address a couple of the issues that Dave has raised.
First off, as I mentioned when chatting about this “new wave” on the #wordpress chat channel, I haven’t noticed anything out of the ordinary with the comment and spam flow going through Akismet, even though we’re protecting well over 90,000 blogs already. I think the reason for this is that Akismet adaptively uses the content of the comment, not just the characteristics of the request. (Which is what he’s talking about when he says the spambots have become smarter.)
However even though Akismet automatically adapted to this latest wave of attacks (we blocked 94,946 spams last Wednesday) Dave questions its effectiveness in the future, saying “My personal issue is that I am doubtful of the long-term resilience of a monolithic DB such as Akismet’s when confronted to both Denial of Service attempts and data poisoning.”
The first issue is denial of service, or what would happen if spammers decided to point thousands of zombie computers in an effort to take down the Akismet service. The motivation isn’t clear, since the way Akismet is designed if the service is down comments don’t get through, they just are held for moderation like normal. But despite that, this could be a serious problem. We’ve taken the common sense steps against it, such as having per-user throttling at the web server level, balancing traffic, and choosing a network bandwidth provider with DoS (Denial of Service) protection, but this is something that still may be a factor. Even companies like Yahoo and Microsoft have had DoS issues in the past.
What we can promise though is our systems are redundant, fully backed up, and in case of emergency our strategy of utilizing API keys as subdomains would allow us to redirect Enterprise and Pro-blogger users to a different set of servers while we address the issues. Since API keys are secret, it would be impossible for the attackers to know where we were routing the paid traffic via DNS and paid users would be protected.
To clarify Dave’s second concern, data poisoning is where the effectiveness of a system would decrease over time because of bad data being introduced into the mix. I can’t get into too many specifics here, since our protection against this is part of the “secret sauce” behind Akismet, but I think the performance of Akismet speaks for itself. It is a huge target, being bundled with WordPress, adapted for numerous platforms, and having over 90,000 users already. (Larger than many blog hosters.) Yet in spite of all that (and partly because of all that) Akismet has only become more effective with time, and it is now 33% closer to no missed spam or false positives than when it started. The system was designed from the ground up to prevent poisoning, and though there have been many attempts none have adversely impacted the system yet.
Anyway it looks like Dave is committed to continue developing Spam Karma and I’m excited to see what he comes up with next, and if you use his software you should consider donating. Spam Karma + Akismet is still a killer combo, though due to SK’s license you can’t use it commercially. If you are a commercial user an Akismet commercial license plus Bad Behaviour thrown in for good measure will let you rest easy.