In the last year this site, and other sites using different wiki engines, have been plagued by robots planting lots of links on random selection of pages. This has become a serious problem, and needs a fix. But before that, I would like to thank all the people who recently have despammed this site manually!

On TaviPatches Jay Sheth has provided a [link] to an anti spam patch he has written for 'Tavi. This patch has been adopted into the main code of 'Tavi, and with release 0.26 this will be available as an option. That is if you set the variable $UseCaptcha=1 in config.php, then the preview window will add a box at the top using [captcha], and not allow saving of pages without entering the correct combination of characters.

Hopefully this will prevent this and other sites to be spammed. Note that the 'Tavi implementation does not include the logging of spam attempts. This due to the simple reasoning, that I didn't want to change the database tables. I do however challenge someone to make a patch which would allow for this logging, with a description on how to add the necessary tables and stuff.


Thanks for implementing the anti-spam feature Even, and for giving me credit in the source. Your implementation looks good. I will look at the code in more detail, try it some more, and then add feedback here. - JaySheth?


Advanced anti-spam features

Some thoughts on the spam issue:

{ip blacklists are of very limited effectivness, it is very easy to change your ip. in fact most isp's use dynamic ip's which change frequently. Blacklisting ip's has the side effect of potentially blocking legitimate users who happen to get the same ip that was previously used by a spammer - therefore expiration time on ip blocking should usually be short, unless there is a pattern of repeated attacks from the same ip. But consider for instance a large school with an internal network and a single public ip, one virus infected computer blocks the whole school.}

--HermannSchwarting

+1 from me on the above comments. Also, is it only me that has problem reading the ASCII captcha? I would like to use a image captcha instead. A good example is http://drupal.org/project/captcha. -- FredrikJonsson

Cookies required

The current solutions requires the user to accept cookies without telling him so. If you have cookies disabled you just return to the edit page when saving as if you had entered a wrong captcha. The anti-spam system should work without cookies or at least should warn the user to accept them.

--HermannSchwarting

I'm thinking of adding some sort of infobox on the top of pages. Both to address this issue, and to make it possible to add other notes. The anti-spam system needs sessions, but that should be serverside unless your webserver has changed it configuration, but then it's usually includes the session ID in the links. The cookies are mainly used for storing preferences, and if disabled one can not store personal preferences.
If the webserver (think of mass hosting) does not allow php's transparent session ids or you don't like them, you could still embed them as hidden <input> on the edit and preview page. --HermannSchwarting

Why are we spammed now?

Anyone got an idea why we are still being spammed. Or how? How do they circumvent the gotcha stuff? Anyone got any idea? Is the gotcha idea proven to be unuseful? Do we need other measures?

Your text based captcha won't stop a determined spammer. Being text based it is quite a bit of work but not too difficult to write a program that can decipher your captcha. Even image based captchas are not immune, a bunch of computer science students at MIT had a contest and turned in some impressive results with image recognition programs. Also, if there is enough money in it, some spammers have resorted to hiring cheap labor to actually sit at keyboards and manually spam a site, though fortunatly this is still pretty rare. none-the-less a text based captcha is sufficient to stop most of the spammers. Some sites have found that a question and answer format is very effective, such as: "What is two plus three?". This is especially effective if the questions require knowledge of a particular subject.


MediaWiki? seems to do this via IP banning and username revocation. While the aboves are good preventatives there are ways around everything. The Captcha is one of the harder things to get around but also very tough for humans to get through. If accessibility is a concern then captchas are just mean! For the most part the real problems are likely to be human. If we can ban IPs and usernames - perhaps even only from certain areas - we will do fine! - Frink