alexhansford + spam   1

In Search Of The Perfect CAPTCHA
  CAPTCHAs, or Completely Automated Public Turing Tests to Tell Computers and Humans Apart, exist to ensure that user input has not been generated by a computer. These peculiar puzzles are commonly used on the Web to protect registration and comment forms from spam. To be honest, I have mixed feelings about CAPTCHAs. They have annoyed me on many occasions, but I’ve also implemented them as quick fixes on websites.
This article follows the search for the perfect solution to the problem of increasing amounts of human-generated spam. We’ll look at how and why CAPTCHAs are used and their effect on usability in order to answer key questions: what is the perfect CAPTCHA, and are they even desirable?
The Incentive To Act HumanTo understand the need for CAPTCHAs, we should understand spammers’ incentives for creating and using automated input systems. For the sake of this article, we’ll think of spam as of any unwarranted interaction or input on a website, whether malicious or for the benefit of the spammer (and that differ from the purpose of the website). Incentives to spam include:
Advertising on a massive scale;Manipulating online voting systems;Destabilizing a critical human equilibrium (i.e. creating an unfair advantage);Vandalizing or destroying the integrity of a website;Creating unnatural, unethical links to boost search engine rankings;Accessing private information;Spreading malicious code.All of these incentives lead to profitable or otherwise gainful situations for spammers. Automating the process obviously allows for superhuman speed and efficiency.
Those who run websites know that this is a big business and a big problem. Akismet, the popular spam killer (commonly seen as a WordPress plug-in), catches over 18 million spam comments per day and has caught more than 20 billion in its history. Mollom, which provides a similar service, catches over half a million spam comments per day and estimates that more than 90% of all messages are spam.
No amount of asking nicely will stop the spammers, but their greed can be used against them; using automated systems to increase profit does have a weakness.
Enter the CAPTCHAOn one side of the coin is the spammer; on the other is the humble website owner, a pleasant sort, who experiences common problems:
Blogs and forums that sink under the weight of spam posts,Accounts that are registered under false pretences for unlawful purposes,Bots that ruin the dynamics of a website,A dive in the quality of content and the user experience.Automated spam plagues website owners to no end, so CAPTCHAs are appealing and compelling… initially. The time needed to moderate and review user-generated content versus the time needed to implement a CAPTCHA is what pushes most developers to do it.
In fact, CAPTCHAs are used a lot. The reCAPTCHA project estimates that over 200 million reCAPTCHAs are completed daily, and it takes an average of 10 seconds to complete one. The Drupal CAPTCHA project logs close to 100 thousand uses per week, and this is just a fraction of websites (those that choose to report back).
CAPTCHAs tackle a problem head-on: they focus purely on stopping spammers. Genuine users are, for the most part, overlooked. That is to say, an assumption is made that the normal behavior of users is not affected.
It’s not true, though. The issue of genuine usability is not new. The W3C released a report back in 2005 on the inaccessibility of CAPTCHAs, which suggested that some systems can be defeated with up to 90% accuracy. More recently (in 2009), Casey Henry looked at the effectiveness of CAPTCHAs on conversion rates and suggested a possible conversion loss of around 3%:
“Given the fact that many clients count on conversions to make money, not receiving 3.2% of those conversions could put a dent in sales. Personally, I would rather sort through a few spam conversions instead of losing out on possible income.”
— Casey Henry, CAPTCHAs’ Effect on Conversion Rates
In 2010, a team from Stanford University released a report entitled “How Good Are Humans at Solving CAPTCHAs? A Large Scale Evaluation” (PDF), which evaluates CAPTCHAs on the Internet’s biggest websites. Unsurprisingly, the results weren’t favourable, the most astounding being an average of 28.4 seconds to complete audio CAPTCHAs. The study also highlighted worrisome issues for non-native English speakers.
Web developers like Tim Kadlec have called for death to CAPTCHAs, and he makes a strong argument against their use:
“Spam is not the user’s problem; it is the problem of the business that is providing the website. It is arrogant and lazy to try and push the problem onto a website’s visitors.”
— Tim Kadlec, Death To CAPTCHAs
Completing a CAPTCHA may seem like a trivial task, but studies (like that of the W3C) have shown that that’s far from the reality. And as Kadlec mentions later in his article, what about users with visual impairments, dyslexia and other special needs? Providing an inaccessible wall doesn’t seem fair. Users are the ones who invest in and give purpose to websites.
The question is, are CAPTCHAs so unusable that they shouldn’t be used at all? Perhaps more importantly, does a usable CAPTCHA that cannot be cracked exist? If the answer is no, what is the real solution to online spam?
The World Of CAPTCHAsThe human brain is an amazing piece of work. Its ability to conceptualize, to find order in chaos and to adapt under extraordinary circumstances makes it highly useful, to say the least. For some tasks, it outshines a computer with great ease. In other tasks — mathematics, for example — it is laughably inferior.
Logic would dictate, therefore, that the most successful CAPTCHA would be:
A task that users excel at naturally but that computers can’t begin to comprehend,A task that is incredibly quick for users to perform but arduous for computers,A task that minimizes the need for additional user input,A task that is relatively accessible to all users, even those with special needs (that is, the CAPTCHA should be no more difficult than general Web usage and the current task demand).One of the greatest advantages that humans have over machines is our ability to visually recognize patterns. The most popular CAPTCHA technique derives from this.
Web developers have explored many options: simple recognition tests, interactive tasks, games of Tic Tac Toe and equations that even mathematicians would have struggled with. We’ll explore the more sensible ideas being implemented online today.
Text RecognitionThe most popular type of CAPTCHA currently used is text recognition (as seen with the reCAPTCHA project).
The reCAPTCHA project aims to stop spam and help digitize books.
reCAPTCHA was created at Carnegie Mellon University, home to the CAPTCHA pioneers and (in 2000) coiners of the term. Now run by Google, the project uses scanned text that optical character recognition (OCR) technology has failed to interpret. This, in theory, provides unbreakable CAPTCHAs, with the secondary benefit of helping to digitize books.
reCAPTCHA’s example of failed OCR scanning.
Concerns of accessibility and usability are often voiced with regard to this type of CAPTCHA. Completely illegible CAPTCHAs are common on the Web, and asking users to perform impossible tasks can not be good for usability.
The reCAPTCHA project does make efforts to provide audio alternatives for visually impaired users, but many more text-recognition CAPTCHAs are being used without aids. As noted in the Stanford University study, audio CAPTCHAs take a long time to complete. The same study also highlighted an undesirable reliance on recognition of English-language words.
Another take on the basic text CAPTCHA was introduced in late 2010 by Solve Media, whose solution was to replace text with an advertisement and a related question, a move that many saw as too invasive.

Solve Media claims its CAPTCHAs can be solved more quickly than others. While we should be skeptical of marketing talk, there is clearly some potential, given that many global brands transcend a single language. There is potential here for marginal improvement.
While text-recognition CAPTCHAs have a few downsides (e.g. spammers could use a software that would be able to recognize text embedded in the image and try all possible combinations to “break” the anti-spam mechanism), they are undoubtedly recognizable. This fact alone can go a long way in usability decisions.
Logic QuestionsSome have suggested that answering simple logic questions would be better than performing visual tasks, the idea being that the complexity of written language would be enough to confuse computers.
The TextCAPTCHA service has over 180 million questions in its database, including:
The 6th letter in “unrolled” is?What is fifty-eight thousand, five hundred and seventy-four as digits?Which of 3, twenty-nine, 70, 46 or 65 is the lowest?These CAPTCHA questions are designed for the intelligence of a seven-year-old child. They are far more accessible than text and image recognition, and while this is a big advantage, it comes with a price. First, the time required to read and comprehend these questions will vary because they are unusual and unknown to users. Secondly, computers can still break these CAPTCHAs. Joel Vanhorn points to Wolfram Alpha as an intelligence source strong enough to crack them.
With the likes of IBM’s Watson recently showcasing an eerily human-like ability to process language, such technology might become mainstream quicker than we think. Instead of worrying about logic questions becoming solvable by computers, we should use this technology to analyze user-submitted content and then separate natural language from the computer-generated content that is common to spam. Services like SBlam! are implementing this idea.
Questions that are website-specific, such as “What is the name of this website?” and “What is the dominant color in the image a[…]
Coding  captcha  spam  from google
march 2011 by alexhansford

related tags

captcha  Coding  spam 

Copy this bookmark:



description:


tags: