Internet Marketing

Why Can’t Bots Check “I Am Not A Robot” Checkboxes?

Posted by

Have you ever checked the box and wondered why a robot couldn’t do something so simple?

Come to find out, there is difference between how humans move the mouse versus how bots do it.

Google’s reCaptcha system uses this fact to predict whether you are a human or not, using a predictive model trained with sample mouse movements made by both humans and abusive bots.

As you move your mouse over the “I’m not a robot” widget towards the checkbox to click, every tiny movement is captured and sent to the predictive model.

The predictive model analyzes your mouse movements against the sample set of data and decide whether you are human or a bot.

It’s interesting to note that Google invented an entire virtual machine – essentially a simulated computer inside a computer – just to run that checkbox.

That virtual machine uses their own language, which they encrypt twice.

This is no simple encryption. Normally when you password protect something, you might use a key to decode it. Google’s invented language is decoded with a key that is changed by the process of reading the language, and the language also changes as it is read.

Google combines that key with the web address you’re visiting, so you can’t use a CAPTCHA from one website to bypass another. It further combines that with “fingerprints” from your browser, catching microscopic variations in your computer that a bot would struggle to replicate (like CSS rules).

All of this makes it purposely difficult to understand what Google is even doing. In fact, you need to write special tools just to analyze what’s happening, and it turns out that people have done just that.

They’ve found that Google is recording and analyzing:

  • Your computer’s time zone and time
  • Your IP address and rough location
  • Your screen size and resolution
  • What browser you’re using
  • What plugins you’re using
  • How long the page took to display
  • How many key presses, mouse clicks, and tap/scrolls were made
    And … some other stuff we don’t quite understand.
    They then combine all of this data along with their knowledge of the person using the computer. That’s right, Google observes the behavior of billions of real people.
    How they check all of this information is impossible to know, but we do know that to beat the captcha, you’ve got a ridiculous amount of messy human behaviors to simulate which are almost unknowable. Not to mention the fact that they keep changing, and you can’t tell when.

    And you thought you were just checking an innocent looking little box, didn’t you?