Cognitive Services, Clippy? AI's silent infiltration of Microsoft's Office stack
Facial and voice recognition – just beware translations
In just over a year, Microsoft has launched and expanded a set of APIs called Cognitive Services, which handle everything from face and emotion recognition, to a Language Understanding Intelligence Service (yes, LUIS for short) and a Custom Decision Service.
At the Cognitive Services homepage you can get your hands on documentation, SDKs, a free trial and the Stack Overflow community.
For developers, these APIs can be used to add gesture recognition to an app, or deliver a routing logistics plan that takes into account travel times. The real estate industry could provide a rating system for location desirability, based on factors like proximity to public transport and restaurants. These are just a few examples of where data already exists and we’re training the machines to be smart enough to use it, in context.
That’s for the rest of us, but what’s Microsoft been up to? Microsoft has released some features for specific segments, most noticeably education, with others appearing with relatively little fanfare .
Intelligent language processing
Among this group is Dictate, a Microsoft Garage project that provides speech-to-text translation inside Outlook, Word and PowerPoint in real time. Garage came out of Microsoft’s Office labs in 2009 and has gone on to launch a range of cross-platform apps incubated by Microsoft units for Android and iOS in addition to Windows Phone. I used Dictate to write some of this article – but be warned, though: it doesn’t have a perfect hit rate. While it’s fun to play with, don’t roll it out to your enterprise just yet – Microsoft Garage projects are completely experimental.
If you want to install Dictate on your own machine, Microsoft’s Cortana personal digital assistant would appreciate the practice. It’s the same language processor behind the scenes and the more natural language examples we can throw at it, the smarter it should become.
Closed captions for the masses
Also hailing from Microsoft’s Garage is Presentation Translator. The Microsoft Translator live feature detects 10 supported languages for slightly delayed subtitling as you speak, directly onto the bottom of your PowerPoint presentation.
In addition, it can translate those into over 60 supported languages for subtitles in a tongue other than your own. No doubt the concept is cool, but Presentation Translator had, er, fun recognising some English words. I’d therefore be worried about what it was actually saying in French or Russian. I advise trying it on a friendly audience before putting this anywhere near a prospective international customer.
It’s Cortana’s Office
On Windows 10 and with a business-grade Office 365 account, Cortana gets a little more like a personal assistant. You need to connect your Office 365 account, then Cortana will tell you meetings you have coming up, including who in your organisation you’re meeting with and what relevant documents they’ve been working on. She’ll even tell you if traffic is heavy or if your flight is delayed.
This kind of service demonstrates how the machines can make connections between different data sources including those outside of your organisation like traffic. When surfacing related documents, Office 365 makes sure you only see information that you have permission to access, so you don’t accidentally find out that the boss has been working on redundancies.xls. Be warned, though: if you’re using your voice to summon Cortana, to understand what you’re saying she’s sending your data to Microsoft’s Cloud-based speech recognition service.
Maybe we’re just not used to talking to our PCs, but Cortana on Windows 10 feels more like a gimmick right now than an indispensable part of our business day.
Video killed the radio star
The Microsoft Translator Speech API also makes an appearance in the latest Office 365 product, Microsoft Stream. It starts with automatically transcribed closed captions, fully searchable and time stamped. Then the Video API kicks in, enabling face detection to build a timeline of where in the video that person appears. All with zero effort on your part. The catch? You’ll need to be on an Office 365 E5 plan (yes, the most expensive one) for these advanced features or purchase a Microsoft Stream 2 plan to add on to your Kiosk, E1 or E3 tenant. By default, the E plans support viewing and uploading videos to Microsoft Stream, but not the cool cognitive services stuff.
There’s enhanced level of intelligence behind two features in Office 2016. In Word and Excel, Smart Lookup will take a highlighted piece of text and presents you with synonyms, definitions, Wikipedia and web searches. Unsurprisingly, the web searches are performed with Bing – so, yeah.
The intelligence in the searching is that Smart Lookup takes the context of the words around the ones you’ve highlighted, analysing other things you’ve typed to bring clarity to the meaning of your highlighted text. Help, meanwhile, has been replaced with the: “Tell me what you want to do” pane. Again, we’re seeing natural language query analysis in play, for better understanding and search results.
Breaking down the written word
This one comes with a feel-good factor, but if you’re not in the education space, you’ve likely missed the OneNote Learning Tools set. The Immersive Reader feature can highlight words as it reads aloud, optimise spacing for better readability, display syllable breaks and highlight parts of speech like verbs, nouns and adjectives – all from your own text.
There is a ton of intelligence working in the background to deliver this experience. After educators saw amazing results in improving children’s literacy with these tools, Microsoft extended it to Word Online, OneNote Online, Outlook.com, Outlook on the Web and the OneNote Windows 10 app, as well as OneNote Desktop, even without an education plan.
Microsoft has spent more than two years slipping its Cognitive Services into its stack. The question is: are they ready for the masses? ®