Artificial intelligence, or AI as it's widely known, might be one of the hottest buzzwords right now, but the core technology behind it isn't something new, especially for technology giants like Google and Microsoft that have been working in this area for years. As recent advancements in the area have started to impact technology that consumers use every day, Microsoft is now leveraging skills of its engineers and researchers to bring AI as a companion to humans in ways that impact their lives. The company detailed some of the recent developments built by its Indian engineering and research teams at a three-hour-long meeting in Bengaluru earlier this week.
One idea associated with AI is that its rise will lead to machines replacing humans at large scale. Instead, Microsoft believes AI will take new shapes to uplift mankind. "We don't think about computers as competing with humans and are replacing humans," Sriram Rajamani, Managing Director, Microsoft Research Lab - India, said as he kicked off the meeting. "We think about them as amplifying human abilities."
Rajamani highlighted that AI comes as tech that can perceive and learn and has reasoning. "If you think about the state of AI today, the perception we have come quite far," Rajamani said. "Today, machine perception is actually in some cases even better than human perception." He also noted that in the areas of cognition and reasoning, the capabilities of humans and machines really complement each other. "Machines are able to do good reasoning in narrow domains," he said. "We have a computer that can index billions of documents and search through them. Our Web search works like that. But we don't have a computer that has a common sense even matches a two-year-old [human]. That's the complementarity."
Kalika Bali, a researcher at Microsoft Research India Lab, emphasised that the present model of AI requires pragmatics to add a social sense and social intelligence. Citing an analysis by McKinsey Global Institute, Bali underlined that while AI is getting quite good in surpassing various human capabilities, it hasn't yet started when it comes to social and emotional capabilities of humans. "Pragmatics is important for AI," the researcher said. "What do we do in pragmatics is we tell others about our communicative intent, whether we are being polite or sarcastic or focusing on a certain topic or not focusing on a certain topic. All this in various ways is important for our digital assistants, chatbots, and other AI developments."
The key areas where Microsoft is bringing its AI developments include vision, language, speech, search, and knowledge. All these are notably the areas that are making computers valuable to us. The software maker has the Seeing AI app as a great example that combines AI and computer vision technologies to describe people, text, objects, colours, and even currencies in a real time to visually-impaired users. The app is available for download on iOS and can detect Indian currency and narrate the denomination to users. It also supports other currencies like the Euro, USD, and Canadian dollars, as well as British Pounds.
Alok Lall, Partner Technology Lead, Microsoft India, highlighted that in addition to the Seeing AI app as a model, Microsoft has its Computer Vision API to give developers the features such as the ability to understand and analyse images, text, documents, and audio/ video content, recognise celebrities, and OCR (optical character recognition). "It helps you drive auto-tagging and can now offer tagging even in multiple languages," said Lall. The latest version of PowerPoint provides a sample of how the Computer Vision API intelligently works. When you put two images in a slide, the system automatically brings a set of recommendations on how the images should be accurately placed.
Further, Lall talked about the Emotions API that enables advanced features such as face and emotion detection, face verification, similar face searching, face grouping, and face identification. The executive also mentioned a Machine-assisted Content Moderator that is projected as a helpful AI development to classify and identify content, including text and images, for any particular audience.
Exemplifying the developments in the vision area, Microsoft has the Intelligent Kiosk app for Windows 10 devices that uses Microsoft Cognitive Services to analyse the emotions on human faces and predict age, gender, and emotions of people featured in still photographs. Developers can use the app to create and train a model to offer face identification against any pre-determined set of faces. Moreover, expanding the vision-focused developments from prototyping to real-life models, Lall revealed that private insurer Bajaj Allianz General Insurance recently deployed an automated video interviewing solution from Microsoft partner Talview to hire from the pool of untapped talent from multiple cities across India. The Microsoft Azure-powered solution can be accessed by candidates from any smartphone and can work with low-speed data connections.
Apart from enabling image detection via its Computer Vision API, Microsoft has an advanced Custom Vision service. This API lets users train the AI as per their preferences. The company also offers object detection. "It's no longer just about image detection," said Lall, adding, "We also have now improved custom vision to do object detection. That's the fascinating part as once you've trained your model and once you've evaluated it based on the sample size that you put in, this model could be then deployed on the edge, which means I don't have to worry about a large stream of data that comes in from my edge device."
The Custom Vision service that comes from the Microsoft Cognitive Services family offers active learning and lets developers customise their own computer vision models.
Alongside the focus towards the vision area, Microsoft has a bunch of developments on the speech front. Puneet Garg, Principal Program Manager Lead - AI&R, Microsoft India, demonstrated a recently developed feature that uses the power of AI and Deep Neural Networks (DNN) to offer real-time translation in Hindi, Bengali, and Tamil. This feature, which is powered by Microsoft Speech Recognition for Indian English, is designed to offer real-time translation to users while surfing the Web using Microsoft Edge or while using Microsoft Office 365 products like Word, Excel, PowerPoint, Outlook, and Skype. Garg highlighted that while English is a universally spoken language, there are differences in the way people speak in English and thus there is a need for customisations to build a translation model specifically for a particular region. "When we worked on Speech Recognition for Indian English, we made sure that we collect data for India and train the models accordingly," he revealed.
The customisations helped Microsoft provide accurate translation experiences for conversations between Indian users. Also, there is the AI-powered Microsoft Translator app that enables translation from text, speech, and images and supports Hindi, amongst a list of international languages. The app also works with the real-time translation feature to bring diverse speakers under one roof.
Apart from translation, Microsoft is using AI to bring Hindi speech recognition. Garg demonstrated this model through the Bing Android app that offers search results and weather information by parsing queries in Hindi. The model also recognises unique Indian names. "When we work on English data and use that as the base, we're able to get a lot of words and generally spoken nouns well," the manager said. "But to add support for very Indian names and Indian entities, you have to work additional to get appropriate training data so that your recognition quality can be better."
Garg highlighted that enabling Hindi translation and speech recognition isn't as easy as enabling the same developments for the English language due to the diversity of Hindi-speaking audience, mixed code, and a large number of phonemes. However, Microsoft was ultimately able to bring translation to Hindi users by bringing diverse data. "To make sure we're able to build a robust model that tomorrow anyone in India who speaks Hindi can use, we made sure we have data representation from all across India - not just only from one aspect of India," said Garg.
Prashant Choudhari, Principal Program Manager, AI&R Microsoft India, at the meeting presented the basic structure that the company has to help developers begin with AI. There are three key solutions - Microsoft Knowledge Cloud, Cognitive Services, and Microsoft Bot Framework - that are empowering developers to let them bring new AI-based solutions to the market. Further, for developers looking to launch a bot, Choudhari stated the Microsoft Bot Distribution Channels offer a modular approach for publishing a bot that have the potential to reach a billion users through Bing, Cortana, Skype, and more. Developers are free to choose their choice of publishing channel or bot framework. "The way the entire stack is set up is that each section is not tightly coupled with each other. What that means is you can choose to develop your bot on Bot Framework, deploy it on AWS (Amazon Web Services), if you wish, and then light it up on Facebook Messenger," Garg said.
In addition to supporting developers to bring new AI experiences, Microsoft is natively using AI within Office 365 and Bing. Garg showcased the presence of signals from the Microsoft Graph that enables tailored search results across Office content, while the Bing Knowledge Graph brings curated outlines for any topics via QuickStarter in Sway and PowerPoint. Specifically for groups and organisations, Microsoft has a MyAnalytics add-on that provides insights to help understand how users spend time at work across meetings, email, the time they designate as "focus time", and after-hours work. Going forward, the add-on will be able to deliver insights to teams and Office 365 Groups to help teams uncover hidden inefficiencies and allow them to align their priorities better.
Disclosure: Microsoft sponsored the correspondent's flights for the meeting in Bengaluru.