Pro Tip: Testing, testing

If you’re developing a new implementation for the Akismet API, or integrating an existing library with your own application, you will of course need to test it. Often we see developers get ahead of themselves, making a few trivial API calls with minimal values and drawing the wrong conclusions about Akismet’s accuracy. Here are a few tips on what and how to test, and an outline of what you should and should not expect.

Use a test API key

If you’re developing your own code, please contact us and ask about creating an API key for testing purposes. We like to keep in contact with developers so we can help make sure you get the most out of Akismet.

For automated testing, include the parameter is_test=1 in your tests. That will tell Akismet not to change its behaviour based on those API calls – they will have no training effect. That means your tests will be somewhat repeatable, in the sense that one test won’t influence subsequent calls. (Be aware however that Akismet is non‑deterministic, so you can expect to see results that change over time. See below for ways of forcing a specific response when you need a predictable test.)

There are no separate sandboxes or test servers. You needn’t worry about your tests having an effect on anyone else or on Akismet as a whole – we maintain careful isolation between API keys and users in order to make sure no one can adversely influence Akismet, accidentally or otherwise.

Test your API calls

Akismet works by examining all the available information combined. It is not enough to provide just the content of a message; you need to provide as many independent pieces of information as you can in each call. So before you can test Akismet’s accuracy, you need to make sure you’re sending complete and correct information.

To simulate a positive (spam) result, make a comment‑check API call with the comment_author set to viagra‑test‑123, and all other required fields populated with typical values. The Akismet API will always return a true response to a valid request with that value. If you receive anything else, something is wrong in your client, data, or communications.

To simulate a negative (not spam) result, make a comment‑check API call with the user_role set to administrator, and all other required fields populated with typical values. The Akismet API will always return a false response. Any other response indicates a data or communication problem.

Also, make sure your client will handle an unexpected response. Don’t assume that the comment‑check API will always return either true or false. An invalid request may result in an error response. Additional information will usually be available in HTTP headers in this case. And of course a connectivity problem may result in no response at all. It’s important not to misinterpret an invalid response as meaning spam or ham.

Test your data

Akismet is highly dependent on the quality and completeness of the data you provide. It’s important to provide as many parameters as possible, and to make sure they contain correct values. If you can’t populate a particular field because that information is unavailable or irrelevant, use an empty string – a missing value is better than an incorrect or made up one.

We recommend capturing a few of your API calls in order to make sure they really do contain the intended values. It’s quite common for a bug to cause the user_ip or user_agent value to be incorrect, for example – make sure they come from the remote browser that posted the comment, and not from your server.

Also make sure that the values your submit‑spam and submit‑ham API calls match your comment‑check API calls as closely as possible. In order to learn from its mistakes, Akismet needs to match your missed spam and false positive reports to the original comment‑check API call, made when the comment was first posted. It’s normal for less information to be available for submit‑spam/ham calls, since most comment systems and forums won’t store all metadata. But you should make sure that the values you do send match the originals. (A common bug is for clients to mistakenly send the moderator’s IP address or user agent instead of the comment poster’s when reporting a comment as spam).

Finally, try to send unmodified data if you can. Most applications will transform content with formatting and markup. It’s best to send Akismet the original content, prior to formatting, if you can.

Test with live comments

It’s important to test with a significant amount of real live data if you want to draw any conclusions about accuracy. We often hear from developers who have made a handful of API calls using imitation comments they’ve written themselves, and who aren’t seeing the results they expect. This is because Akismet works by comparing comments to genuine spam activity that is happening right now (and it does so based on more than just the content). An artificially constructed test spam comment probably won’t have much in common with real spam, so Akismet correctly returns a negative response.

The best way to measure Akismet’s accuracy is with a feed of live data from your production servers. Don’t act on its responses yet, just log the results to a file or store them in your metadata for analysis. Examine the results, or compare them with another filtering method, and decide if they are acceptable. Systematic errors usually indicate a data issue, so if you notice any oddities then please tell us – we can probably suggest ways of improving accuracy.