The days of CAPTCHA systems seem to be numbered. A group of researchers from UK and China have found a way to harness the power of machine learning to crack the text-based CAPTCHA systems that once ruled the Web and can still be found on the Internet. Text-based CAPTCHAs (Completely Automated Procedures for Telling Computers and Humans Apart) use a jumble of letters and numbers as well as occluding lines to identify bots (non-human automated users). According to a report, the new method is highly efficient and is far more accurate from previous CAPTCHA solvers.
The researchers from Northwest University, Peking University, and Lancaster University tested the new method on 33 CAPTCHA systems including 11 that are used by some very popular websites and achieved impressive results. The CAPTCHAs on China's Sohu website were the easiest to beat with 92 percent accuracy, followed by eBay (86.6 percent), JD.com (86 percent), Wikipedia (78 percent), and Microsoft (69.6 percent). Google's text-based CAPTCHAs, on the other hand, were the most resilient to this method with just 3 percent accuracy, reports Naked Security by Sophos.
What is even more surprising is that the new CAPTCHA solver needs just 500 genuine CAPTCHAs to refine itself, compared to millions that solvers in the past have required. Further, the new solver can beat a text-based CAPTCHA within just 0.05 seconds, using the power of a humble desktop computer and GPU.
“We show for the first time that an adversary can quickly launch an attack on a new text-based captcha scheme with very low effort. This is scary because it means that this first security defence of many websites is no longer reliable,” said Dr. Zheng Wang, Senior Lecturer at Lancaster University's School of Computing and Communications and co-author of the research.
The academics used an artificial intelligence algorithm technique called Generative Adversarial Network (GAN) to beat the text-based CAPTCHA systems. The GAN includes two parts – the generative network that synthesises a lot of examples of the target (text-CAPTCHAs in this case) and the discriminative network that assesses the output against examples from the real world. The constant cycle between the two systems allows both to get better at their tasks.
According to Naked Security, the researchers have tried using GANs to take on image-based CAPTCHAs in the past but this is the first time they have been used to take on text-based CAPTCHAs, that too with great results.
As Dr. Wang noted the implications of this research are alarming and even though the text-based CAPTCHAs are not as common these days as they used to be, this is a wake-up call for the websites that were still guarded by these CAPTCHA systems.