Our homes and our lives are being infiltrated by voice assistants.  Apple has Siri, Amazon has Alexa, Microsoft has Cortana, Samsung has Bixby and Google has….Google. This technology has been advertised heavily in 2017, with most major tech companies pushing hard for their assistant to seize the majority share of this new and emerging market.

According to a recent report, it is estimated that 33 million voice-first devices (such as the Amazon Echo and Google Home) will have been purchased in the USA by the end of the year [1].

But what are people using these voice assistants for? A survey of 2000 users indicated that the most popular voice commands include playing music, setting or changing alarms, and checking the weather [2].

Interestingly, the uptake and usage of third-party applications and commands is shockingly low. Statistics suggest that when a third-party app for a personal assistant such as Alexa acquires a new user, there is only a 3% chance that they will continue to use the application after just one week [1].

Whilst these devices are advertised and sold as products that will support and improve many aspects of our day to day lives, their usage is actually far more limited to a sample of specific use-cases, such as playing music.

Whilst it is inevitable that these devices will improve with time, it is interesting to consider what is behind their underwhelming use so far.

A mismatch between expectations and actual capabilities

Research into the interactions between users and personal voice assistants suggests that their underwhelming use is due to a fundamental mismatch between what users expect the capabilities of voice assistants to be and what the features and functionality of a voice assistant actually are [3].

This gap between user expectations and reality is increased further by the fact that popular voice assistants such as Alexa and Siri have human sounding voices. This goes against the research, which suggests that it’s better for a personal voice assistant to have a robotic sounding voice, as this reduces how capable a user expects the systems to be [4]. An impressive sounding human-like voice might be great for tempting customers into purchasing these devices, but may be causing disappointment once users realise the technology isn’t as sophisticated as they had first hoped.

Flexibility vs Usability

An unusual problem emerges for voice assistants as they become more sophisticated. As their flexibility in the commands they understand and their grasp of language improves, their usability actually drops rapidly [3].

Voice assistant have become more sophisticated, but they are still a long way behind natural levels of human conversation.

This make it difficult for a user to accurately interpret what the system will be capable of understanding. Voice assistants may have advanced from a very strict and rigid set of voice commands to a more flexible and intelligent system, but this can actually make the interactions between the user and the system more strained. As flexibility improves, it becomes increasingly difficult for a user to know what they can and cannot say.

How do we improve the user experience?

Improving the usability of voice assistants is a fascinating future challenge and conducting user research with these devices is unusual as there is no graphical user interface.

As the underlying technology is a long way off being able to replicate the same level of communication as you would get from two humans having a conversation, work is required to help users understand the limitations of the technology.

An interesting argument posits that it is actually a flawed approach to attempt to create personal voice assistants that on the surface appear to be capable of engaging in natural, human-like conversation [3].

Instead, we need to look at existing circumstances and situations where humans adjust their communication strategies and simplify their way of speaking. For example, when speaking to someone with a limited understanding of your native language you naturally change your speed and your pronunciation to aid them in understanding you. Similarly, when talking to a young child you adopt a similar strategy and make sure you talk in a simplified way that they will be able to easily understand [3].

Perhaps we can therefore improve the interactions between users and personal voice assistants by adjusting the ways in which these technologies are framed. By making the limitations of the technology more apparent to the user, they will be more likely to understand what they can and cannot ask and may be less likely to experience frustration when the system can’t replicate full human-like levels of conversation.

User Testing with Personal Voice Assistants

At Userfy we have an Amazon Echo installed in our user testing lab and we’re expanding our skillset to include user testing with this type of technology. As companies naturally become more curious about how they can leverage voice activated skills and applications, it’s important that user research remains an integral part of this process. Understanding how users interact with personal voice assistants is a very new area and we’re excited to see what the next couple of years bring for voice-first devices.

[avatar user=”Sam” size=”medium” align=”left” link=”https://twitter.com/SamHoward_?lang=en”][/avatar]
Meet the author:
Co-Founder and Director of Research at Userfy



[1] http://voicelabs.co/2017/01/15/the-2017-voice-report/

[2] https://www.highervisibility.com/resource/research/how-popular-is-voice-search/

[3] Moore, R. K. (2017). Is spoken language all-or-nothing? Implications for future speech-based human-machine interaction. In Dialogues with Social Robots (pp. 281-291). Springer Singapore.

[4] Balentine, B. (2007). It’s Better to Be a Good Machine Than a Bad Person: Speech Recognition and Other Exotic User Interfaces at the Twilight of the Jetsonian Age. Annapolis: ICMI Press.