Deus Ex Machina: Voice Assistants Roundup

Anatolii Iakimets
3 min readMar 29, 2018
Deposith photos/ AlienCat

Siri was introduced by Apple back in 2011 and since then voice assistants have been steadily growing in popularity with over 46% of Americans using one in 2017. But despite high level of trial and long history, voice assistants’ capabilities are still very limited today.

Most of the voice assistants’ tasks are very simple: doing online searches, finding information on products, asking for directions, playing music…

People use voice assistants either because they allow them to use their devices without hands or just for fun. It is not surprising that the key reason for not using voice assistant is that people are “Just not interested”. In other words, there is limited added value that voice assistants provide today.

Figure 1: share of respondents by answer [1]

We expect voice assistants to be able to execute specific tasks consistently (Closed Domain) and handle long open-ended conversations (Open Domain) the same way as a person would do. While voice assistants can deal with the simple tasks listed above, there are challenges which limit broader application:

  • Dialog Management. Handling context and long conversations is still a challenging task. Most of the voice assistants and bots deal very well with transactional conversations but struggle with longer ones where a person can go back and forth to different topics mentioned earlier.
  • Lack of skills. While voice assistant can read out latest news or emails it is not smart enough to prioritize them and brief you only on most important topics.
  • Personality. Voice assistants need to give consistent semantic answers to similar inputs, e.g. the questions “ Where are you from?” and “ Where do you reside?” should produce consistent answers. This may sound simple, but in reality this is not an easy task.

The reason these challenges exist is that human-like cognitive abilities cannot be replicated easily due to sheer amount of computing power required to do this.

To simulate 1% of brain activity during 1 second requires 40 minutes of 10-petaflops K-computer with 82,944 processors (Processing power of ~50,000 IPhones X)[2]

Advancements in machine learning and consistent investment from technology companies pave the road for voice assistant evolution. Amazon has built an extensive developer community which has resulted in over 25,000 skills available for Alexa. Google’s AI has been reading nearly 3,000 romance novels in order to improve its conversational search abilities. Overall voice assistants were able to achieve 95% word recognition accuracy (same level as humans). With over 70 million of voice-first devices to be shipped annually by 2022 the future looks bright.

By 2020 over 50% of all searches expected to be voice searches[3]

With the technology evolving, not only consumers but also different vertical industries will be able to benefit from advanced voice use-cases:

  • Healthcare. Patients will be able to check their diet, schedule appointment, call for help and ask any treatment-related questions.
  • Automotive. People already use voice assistants to make calls, play music and ask for directions without taking their hands of the steering wheel. In the future drivers will be able to adjust climate control, start the car remotely and even schedule maintenance visit.
  • Entertainment. In the museums visitors will be able to ask any questions about specific exhibition, be it a dinosaur, a medieval sword or a Vincent van Gogh painting.
  • Education. Students will listen to personalized lectures at their own pace, ask questions and even take real-time voice assistant led exams.
  • Customer service. Customers will be able to start resolving cases without the need to spend half an hour on the phone waiting for a free call center operator.

Ok Google, who do you want to become?

--

--