From The Archives: reCAPTCHA and Spare Cycles

This morning, Google announced that it will be acquiring reCAPTCHA, a company devoted to putting the few seconds you spend solving CAPTCHAs – those funny puzzles you fill out on Ticketmaster and other sites to verify that you’re human – into good use.  As announced, Google will integrate reCAPTCHA’s technology into its own spam and fraud countermeasures, and will use the human output of those puzzles to advance its Book Search and Newspaper Archive scanning efforts.

One of my first posts on Tropophilia profiled the founder of reCAPTCHA, Luis von Ahn, and his efforts to harness otherwise-wasted human effort.  Given today’s announcement, I thought it made sense to repost it in order to put into context this acquisition and the “spare cycles” philosophy that it engenders.

Disclosure: I am an employee of Google.  I was not an employee at the time this post was originally published.  All views expressed in this post are mine alone, and do not necessarily reflect those of Google.

———————————

Spare Cycles: Distributing Computing Among Machines and Minds- published January 19, 2008.

A few weeks ago I read an article in The Economist about distributed computing, defined by Wikipedia as “a method of computer processing in which different parts of a program are run simultaneously on two or more computers that are communicating with each other over a network.” Basically what you do is download a program that, when you’re not around, uses your computer’s processor (which would otherwise be mostly idle) to crunch data sent to it from a central server. Your computer joins thousands of others crunching data at any one time, forming a giant networked supercomputer with each unit working on a different piece of the puzzle.

What’s the puzzle? It can be anything, or at least anything that requires a whole lot of computer power to figure out. Some puzzles are humanitarian in nature; for example, the World Community Grid (sponsored by IBM) currently has projects tackling cancer, AIDS, and Dengue fever research, as well as African climate change. Others are more geeky (or, should we say, scientific), like the SETI@home project which is searching for extraterrestrial intelligence by analyzing radio telescope data.

So the bottom line is this: while one way to save the planet and contribute to science is through the donation of time and money, another way is through the donation of your computer’s processing power. Why let your computer idly sit while you’re at work or school all day — occasionally using a small processor burst to throw the next picture from your hard drive onto your screensaver, which no one but your dog is watching — when you can have it use its full capacity to solve some of the world’s toughest problems?

The buzz word for this phenomenon is “donating spare cycles.” Basically, a cycle is the process your computer goes through to retrieve a command from its memory and execute that command. It’s how your computer works and, in a way, it’s how our minds work too. A human cycle, then, would be the process our brain goes through to retrieve and process information from our memory. But do humans have spare cycles to donate? You bet.

Meet Luis von Ahn. I first read about him in this article in Wired magazine. You know those pictures of twisted letters and numbers that you have to enter to sign up for an e-mail or other online account? Those are called “CAPTCHAs”, and von Ahn invented them. CAPTCHA stands for “Completely Automated Public Turing test to tell Computers and Humans Apart.” While optical character recognition has advanced far enough to allow computers to “read” standard text, the technology is not (yet) capable of deciphering the contorted figures presented in CAPTCHAs. By entering the letters you see, you prove to the system that you are a human.

So we’ve established that CAPTCHAs are essentially good, because they protect the integrity of such services as Facebook, Gmail, and so on. But aren’t they annoying? Do you feel like you’ve just wasted 10 seconds of your life when you fill one out? Well, what if I told you that those 10 “wasted” seconds could, when correctly harnessed, actually be used to do some good?

Enter the reCAPTCHA, also invented by von Ahn. What is it? The website for reCAPTCHAs informs us that while it takes about 10 seconds for one person to fill out a CAPTCHA, “in aggregate these little puzzles consume more than 150,000 hours of work each day.” 150,000 hours of “wasted” time every single day. Imagine if those man-hours were put to use!

reCAPTCHAs do just that by asking you to enter slightly distorted characters from two word images in a CAPTCHA. One image’s solution is known by the computer, and the other is not. This second, unreadable image has been pulled from the book scanning project being undertaken by the Internet Archive, similar to the Google Books project. If you get the first word right, then the system assumes that the answer you provide for the second word is also correct. The system verifies the answer through several users before sending it back into the database. In those otherwise “wasted” 10 seconds, you have more or less helped in the effort to render the world’s libraries digital. How ’bout that?

Von Ahn has several projects that are studying and implementing this idea of distributed human computation. The ESP Game randomly matches up two players who are presented with a single image. They are tasked with “tagging” that image with words that describe it. When the two players agree on a word, they get points based on the time it took them to enter the common word and then move on to the next image. After an image is used in several games, commonly tagged words are considered “taboo” and cannot be used as the agreement word between the two players. This forces players to be creative and find more descriptive or more contextual terms to describe the picture.

So what’s the deal? Most image search engines have only two things to go on when scouring the web in response to your query: the title of the image, and the words surrounding the image on a webpage. By using the information gathered from The ESP Game, however, search engines now have human-generated and -verified terms that describe the subject of the image, the colors of the image, even the quality of the image. By playing a (fun and addictive) game, players help make the web more searchable.

Another example of distributed human computing is the Mechanical Turk, a project hosted by Amazon.com. The philosophy is similar: the site recruits users to complete tasks that computers simply cannot — a “human intelligence task”, or HIT. Last summer, the Mechanical Turk was famously used to distribute discrete sections of satellite imagery of the Nevada desert to thousands of humans, who in turn clicked “yes” or “no” as to whether there was any sign of Steve Fossett’s downed aircraft. Fossett was not found and the search was ended, but as many as six previously unknown crash sites were discovered in the process.  [9/16/09 update:  The remains of Fossett and his aircraft have since been located.]  Because this project is not in game-form, users are enticed to stay through monetary compensation — mostly pennies per task, but that can add up in the end.

In a presentation to Google (video here), Luis von Ahn talks about The ESP Game and some of his other initiatives in human computation. If this stuff sparks your curiosity, it’s worth the 50-minute watch. But here’s the money quote, as paraphrased by Bob Wyman:

Ahn estimates that during 2003, 9 billion human-hours were consumed [...] just by people playing Solitaire on their computers. [...] To provide some scale to that number, Ahn shows that the Empire State Building in NYC took only 7 million human-hours to build (6.8 hours of Solitaire play) and the larger Panama Canal took only 20 million human-hours to build (less than a day of Solitaire.)

Idle human time is valuable. If there’s a way to harness it to do good in the world, then I’m all for it.

[9/16/09 update: Removed information about a Tropophilia-sponsored team on the World Community Grid.  This effort is no longer active.]

Images courtesy of Carnegie Mellon University, and Flickr users saschaaa and tmartin.

If you enjoyed this post, you might also like:

- "Spare Cycles: Distributing Computing Among Machines and Minds", posted by Jarred on January 19, 2008

- "Do You See What I See?", posted by Jarred on April 29, 2008

- "Philanthropy Online: FreeRice.com", posted by Taylor on March 17, 2008

- "Debugging Earmarks", posted by Jarred on July 22, 2009

- "Happy Martin Luther King, Jr. Day", posted by Jarred on January 21, 2008

  • crucifi3d
    Although i hate the CAPTCHAs but it does saves from a lots of spam. Google acquiring reCAPTCHA will further increase the ever growing Google network
blog comments powered by Disqus