Facebook is training its AI by using the hashtags you add on your Instagram photos. At the F8 developer conference, Facebook revealed how it is using Instagram images and hashtags to train its object recognition models. The social media giant said it used hundreds of graphics processors to organise 3.5 billion images and 17,000 hashtags and created machine learning models that can beat top-of-the-line industry benchmarks.
The crux of this approach, Facebook says, is using existing, public, user-supplied hashtags as labels instead of manually categorising each picture. By training its computer vision system with a 1 billion-image version of the data set, Facebook claims to have achieved a record-high score of 85.4 percent accuracy on ImageNet, a benchmarking tool.
"We rely almost entirely on hand-curated, human-labelled data sets. If a person hasn't spent the time to label something specific in an image, even the most advanced computer vision systems won't be able to identify it," said Mike Schroepfer, Facebook's chief technology officer.
The issue with this approach is determining which hashtags are relevant to the content in the images. There could be several reasons why a person would add a tag, but all of them may not necessarily indicate objects in the image. In order to filter such data out, Facebook also created its own system that prioritises relevant content, a move to create what it calls the 'large-scale hashtag prediction model.'
In the future, Facebook is also planning to find other ways to use hashtags as labels for computer vision. Those could include using AI to understand video footage or to change how an image is ranked in Facebook feeds. Also, hashtags could help systems to recognise when an image falls under not only a general category but also a more specific subcategory.
Notably, Facebook is only using object-based data right now, and may not be trying to draw inferences about user behavior from the images. Technically, Facebook has managed to achieve an impressive feat by effectively organising such huge amounts of data and turning it into a useful software tool. However, it might raise some eyebrows especially amid the crisis that unfolded in recent weeks over the sharing of personal user data.