Twitter on Wednesday detailed its recent efforts in the fight against
unsolicited content, and how the introduction of a new anti-spam system
called BotMaker is responsible for a 40 percent reduction in its key
spam metrics. Just when this system was implemented was not detailed by
the company however.
The company's Raghav Jeyaraman in an official
blog post described why fighting spam on a micro-blogging network like
Twitter is a very different proposition from defending traditional
systems like email. He also detailed the challenges the team faced when
creating a solution like BotMaker, and provided a simplistic look at the
overall architecture of the system.
Jeyaraman explained that
because of Twitter's wide-ranging developer APIs, meant to allow
third-parties to interact with the platform, spammers "know (almost)
everything" about how the micro-blogging network functions - making spam creation
much easier and deploying countermeasures much harder.
because of the real-time nature of consuming and sharing content on
Twitter, Jeyaraman said the countermeasures could not add greatly to the
latency of the user's experience or the overall platform.
these challenges in mind, the team worked to create a system that would
a) Prevent spam content from being created, b) Reduce the amount of time
spam is visible on Twitter, and c) Reduce the reaction time to new spam
attacks, all while trying to ensure third-party developers were not
able to bypass, fool or tamper with the system, and that it didn't
introduce too much latency.
To do this, the BotMaker anti-spam system
was devised into three parts, with one low-latency sub-system
(Scarecrow) checking in real time for spam in the 'write path' of
Twitter's main processes (such as tweets, retweets, favourites, follows
and messages), and the second computationally-intense and learning
sub-system (Sniper) checking in 'near real time' the user and content
event logs of Scarecrow.
The third system, the BotMaker itself, is
constantly fed information by Scarecrow and Sniper, and issues commands
both in the write path (deny, challenge, accept), as well to the
actioner (suspend, reset password, delete message). Twitter also runs
periodic jobs on all the data compiled by the BotMaker system (and
sub-systems), from routine checks to specific exercises by the
In this fashion, Twitter says BotMaker
helps prevent the creation of spam with a low-latency filter, while
cleaning-up spam that slips by with high-latency processes, and adapting to
better catch spam with machine learning.
Stressing the importance
of being able to quickly iterate and refine its rules and models in
Twitter's evolving fight against spam, Jeyaraman said the BotMaker rule
language, and the data structures it operated upon, were designed in a
fashion to allow for rapid development, testing and deployment of
system-wide code changes, apart from quick edits to BotMaker rules.
evolves constantly. Spammers respond to the system defenses and the
cycle never stops. In order to be effective, we have to be able to
collect data, and evaluate and deploy rules and models quickly."
said a big part of how that was achieved was by making the BotMaker
language type safe, all data structures immutable and all functions
pure, as well as ensuring the runtime supported common functional