This title is a quote form the Google blog relating unsurprising to the purchase of the service by reCAPTCHA by Google. This article has been prompted by the BBC’s program Digital Planet (Episode 29/09/2009 Podcast) which talks about the efforts of Google to digitise books.
The major criticism of the service is the inaccuracies in the meta data taken from the books. It is not the basic information such as the title date or author that can be provided by libraries and publishers; what is problematic however is the data collected by Google through the indexing of the content of the pages of the books.
However the question is will the purchase of reCAPTCHA solve (or at the very least start to solve) the problems faced by Google in indexing the pages of the world’s books. The issue for Google is that no matter how good no OCR software is good enough ot recognise all characters and words in all books therefore meaning there will always be errors that can only be picked up and corrected by the human eye, a hugely labour intensive process. So how will reCAPTCHA solve this?
It can be said almost for certain that at some point you ave come across one of these:

This image is referred to as a CAPTCHA and is used to try and stop automatic input (often from spammers) usually into websites forms. The difference between a normal CAPTCHA and the reCAPTHCHA service is this:
‘reCAPTCHA improves the process of digitizing books by sending words that cannot be read by computers to the Web in the form of CAPTCHAs for humans to decipher. More specifically, each word that cannot be read correctly by OCR is placed on an image and used as a CAPTCHA. This is possible because most OCR programs alert you when a word cannot be read correctly.’
Source: reCAPTCHA
http://recaptcha.net/learnmore.html
This means that you can actually feel good about solving these CAPTCHAs as your helping to digitise books and if your a webmaster you can sleep easy knowing that some of the best OCR software in the world cannot read the characters in the box and so it is unlikely that and spammer can build anything to read these yet further distorted words. However there is still undoubtedly some doubt as to exactly how Google fits into this whole process.
Google offers the Google Books service which seeks to scan and index books from all over the world including some of the out of copyright books of great university libraries such as the Bodleian in Oxford and the library of Harvard University. So with the new acquisition of reCAPTHCHA it would seem Google has solved it own problem of inaccurate meta data. By using the collective human eyes of all those who solve a reCAPTCHA to correct the incorrect scanning of words a much more accurate index is created and we all benefit from better searches of those books Google has already scanned. This means that through the use of reCAPTCHA even those tasks that seem most trivial can have some lasting and useful output.
For more information on reCAPTCHA (and to use it on your site for free) read the reCAPTCHA site and for more information on the acquisition of reCAPTCHA by Google read the official Google blog


![QR Code for [OLD] In the beginning…](http://thesender.co.uk/wp/wp-content/plugins/qr-code-widget/cache/6e5fa8f28021336f44d96f4de168af49.png)