CAPTCHA overview

In order to keep machines from automating access to email, web systems, databases, etc., it may be required to tell if there is a machine or a human making the request. Captcha

Determining if there is a machine or a human at the other end requires that something be asked that is easy for a human to do but difficult for a machine to do. Captcha

One precaution is to require a decision to be made that is not easy to automate, such as recognizing a sequence of letters and/or digits.

To make it hard for a computer to guess, the images are often changed, sometimes at each page access. Computers are not easily able to do such recognition.

Who would want an audio CAPTCHA?

What do you do if you are blind?

There are audio CAPTCHA's. CAPTCHA

The CAPTCHA is a challenge-response test to help insure that the interaction is from a human and not from an automated computer program. One thing that humans are good at but computers are not so good at, but getting better, is pattern recognition - sight, sound, etc. The goal is to generate a challenge that is easy for a human to recognize but difficult for a computer to recognize. As computers become more "intelligent" this is becoming more difficult to which any human who has failed to recognize a challenge can attest. The reCAPTCHA project, at http://www.captcha.net/ (as of 1997), provides subscribing websites the digitized images of semi-automatically scanned library archive books that were not able to be recognized. The results of the human responses are then fed back into the system to save the manual labor of identifying the words. A "voting" process from several sites is used to determine the correctness of the responses. As it takes effort on the part of the attacker to break a CAPTCHA there is some security through obscurity but only until an attacker has the motivation to study and break the CAPTCHA system. Some systems switch to a CAPTCHA system after a certain amount of "automated" usage is detected. Some approaches to breaking a CAPTCHA system include the following.

use OCR (Optical Character Recognition) techniques.
use side-channels such as design/implementation defects, to defeat the CAPTCHA, such as a system that reuses the same images
pass the CAPTCHA along to other humans to solve (e.g., the reCAPTCHA project)
exhaustive enumeration (i.e., brute force) which can be computationally infeasible

A free CAPTCHA service is at http://captchas.net/ (as of 1997). There is a charge to avoid the web site address in each CAPTCHA image.

The trend in CAPTCHA technology is that it is getting more and more difficult to create an image that is easy for a human to recognize while difficult for a computer to recognize. A starting approach to bypassing a CAPTCHA system for a given website is the following.

1. Investigate the HTML source in which the CAPTCHA image appears to determine the CAPTCHA system used by that website.
2. If the system is a common system, study aspects of that system from published material.
3. If the system has not previously been encountered, or if additional research is needed, automate access to the many pages each containing the CAPTCH image and save both the images and the interaction to/from the source web site for study.
4. Analyze the images and interaction to/from the source web site for a weakness.

The same methods can be used to analyze the CAPTCHA system of one's own website to look for weaknesses that might be exploited by others.