Google ‘captchas’ ReCAPTCHA
ReCAPTCHA, the service that error-checks OCR’d text while preventing non-human access to websites, has just been bought by Google. (TeleRead has already covered ReCAPTCHA here and here.)
ReCAPTCHA has already been used for some pretty large digitization projects, such as the scanning of the entire New York Times archives. It makes sense that the company doing the largest digitization project of all would recognize its usefulness in their efforts. Given how error-ridden Google Books’s plain-text versions have been so far, something like this is really badly needed.
Who knows? Next time you verify your humanity with a ReCAPTCHA, you might just be digitizing a tiny piece of a Google Book.














September 16th, 2009 at 6:19 pm
Interesting article here — about low-paid foreign workers hired to solve CAPTCHAs — which might explain Google’s interest in buying (and improving the security of) the CAPTCHA experience.
(I would have title the article: The CAPTCHAN CRUNCH):
http://bits.blogs.nytimes.com/2008/03/13/breaking-google-captchas-for-3-a-day/
September 17th, 2009 at 1:32 pm
This is great news for google books, many of which have horrible OCR applied, and not corrected.
My only fear would be, then, that reCaptcha will from now on be at the service of Google books and Google books alone, and the Internet Archive and other sources that used to enjoy their fruits, will be shut out.