The company's Raghav Jeyaraman in an official blog post described why fighting spam on a micro-blogging network like Twitter is a very different proposition from defending traditional systems like email. He also detailed the challenges the team faced when creating a solution like BotMaker, and provided a simplistic look at the overall architecture of the system.
Jeyaraman explained that because of Twitter's wide-ranging developer APIs, meant to allow third-parties to interact with the platform, spammers "know (almost) everything" about how the micro-blogging network functions - making spam creation much easier and deploying countermeasures much harder.
Also, because of the real-time nature of consuming and sharing content on Twitter, Jeyaraman said the countermeasures could not add greatly to the latency of the user's experience or the overall platform.
With these challenges in mind, the team worked to create a system that would a) Prevent spam content from being created, b) Reduce the amount of time spam is visible on Twitter, and c) Reduce the reaction time to new spam attacks, all while trying to ensure third-party developers were not able to bypass, fool or tamper with the system, and that it didn't introduce too much latency.
To do this, the BotMaker anti-spam system was devised into three parts, with one low-latency sub-system (Scarecrow) checking in real time for spam in the 'write path' of Twitter's main processes (such as tweets, retweets, favourites, follows and messages), and the second computationally-intense and learning sub-system (Sniper) checking in 'near real time' the user and content event logs of Scarecrow.
The third system, the BotMaker itself, is constantly fed information by Scarecrow and Sniper, and issues commands both in the write path (deny, challenge, accept), as well to the actioner (suspend, reset password, delete message). Twitter also runs periodic jobs on all the data compiled by the BotMaker system (and sub-systems), from routine checks to specific exercises by the engineering department.
In this fashion, Twitter says BotMaker helps prevent the creation of spam with a low-latency filter, while cleaning-up spam that slips by with high-latency processes, and adapting to better catch spam with machine learning.
Stressing the importance of being able to quickly iterate and refine its rules and models in Twitter's evolving fight against spam, Jeyaraman said the BotMaker rule language, and the data structures it operated upon, were designed in a fashion to allow for rapid development, testing and deployment of system-wide code changes, apart from quick edits to BotMaker rules.
"Spam evolves constantly. Spammers respond to the system defenses and the cycle never stops. In order to be effective, we have to be able to collect data, and evaluate and deploy rules and models quickly."
Jeyaraman said a big part of how that was achieved was by making the BotMaker language type safe, all data structures immutable and all functions pure, as well as ensuring the runtime supported common functional programming idioms.