The Development Since IBM’s Shoebox To Now
Voice AI has seen incredible growth over the last few years.
To define what “voice” is, there are many definitions and interpretations. The origins come from the sounds humans have spoken and communicated with.
In regards to this article, it is how people search for information with their voice through digital means such as through our mobile devices or voice assistants. This also means the ability to execute tasks through voice commands such as purchasing or putting something on the to-do list.
The Introduction and Development of Voice AI
Voice AI has been around for decades.
IBM was the first to introduce such technologies in 1961.
IBM Shoebox was the first digital speech recognition tool around which at its time, could recognize 16 words and 9 digits.
Primitive but it was a start and laid the foundation of the technology.
Fast forward a few decades, Dragon, in 1990, launched the first-ever consumer speech recognition product called Dragon Dictate for an extraordinary price of $9000. Seven years later, it released an improved version called Dragon NaturallySpeaking. The application could recognize continuous speech and understand at around 100 words per minute. They would later be acquired by Nuance Communication.
In 1996, Microsoft introduced Clippy (who remembers the annoying paperclip) which showed us how natural language in text could be tracked and interpreted on the basis to provide us guidance and suggestions. Being one of the earlier examples of a voice assistant, through Microsoft’s Speech Recognition Engine, Clippy allowed speech inputs and could be used to help find answers to problems.
Although discontinued, one of the most important lessons from Clippy was virtual assistants should only come forward when called for rather than annoy you constantly.
This has been the framework since for all voice assistants.
Fast forward even further and you come across Siri in 2011, the first modern era of voice assistants.
Google Now and Microsoft’s Cortana soon followed and more recently, Amazon’s Echo in 2014.
What’s amazing is that all this major development has happened all in the last decade as the technology is more mature and companies are investing heavily into it.
On the search side of things, Google introduced such features in 2011. At the time voice search was more of a gimmicky feature, even though it was cited as ‘the future of web searching’.
Users on Chrome could use their microphone which allowed for text to speech functionalities within Google Search.
It was however still a beta version.
As we hit 2014, the race to introduce smart speaker systems into the market began.
This is still what is primarily driving the voice AI market as the big tech companies battle it out to capture as much market share as possible.
The State of Voice At Present
Remember the days where you had to repeat words to get the output right?
Fast forward today and the technology has made massive improvements since the 60s.
Speech recognization with Google has now risen to 95% and China’s IFlytek’s speech recognition system has reached close to a 98% accuracy rate. What’s even more amazing is they can also translate directly from English to Mandarin, Korea, and Japanese.
Through these reasons, the viability of smart speaker systems has been more widely adopted.
This is almost 40% YoY growth since 2018.
This growth can also be attributed by cheaper production costs due to partnerships made with Chinese manufacturers which have resulted in cheaper device units.
With huge opportunities outside the US, the smart speaker market is set to grow even further.
Not only this but the use of voice assistants have also increased. Over 65% of 25–49-year-olds speak to their voice-enabled devices at least once per day. Although this age group is the most active voice searchers, it is actually the 18–24 demographic that has been credited to helping drive the early adoption of voice technology.
But what do people use voice for?
There are many applications with voice with a large amount falling into searches (from finding restaurants to basic fact-finding).
This presents a massive opportunity for content creators as well as businesses to optimize their web presence to be voice-ready.
The other large market falls within e-commerce.
It’s estimated that Voice Commerce Sales will amount to almost $40 billion by 2022. Not only are consumers utilizing voice to search for products but also purchasing.
Through integrations and set up, with a simple command you can buy household items like nappies, tissues and more.
Voice search is also making shopping easier for repeat customers by saving their past history and reordering. For instance, you might simply tell your digital assistant to reorder those nappies or tissues.
This makes the product even more sticky due to the convenience it provides.
The Future: Privacy and Data Concerns
Voice AI is heading towards a bright future but there are some concerns.
Recent reports emerged that all voice recordings by Apple’s Siri, Amazon’s Alexa and Google’s Assistant were being reviewed for educational and improvement purposes.
This however still meant potentially personal recordings were being heard and that raised a lot of privacy concerns (with so many of these companies under pressure around privacy and data already!).
Customers have had the power to theoretically block out final words over their digital assistants but there had been no option to block out recordings from the get-go.
Although all 3 have agreed to change their policies to allow users to opt-out, it brings the attention of privacy concerns around voice.
As the world heads towards a more voice-controlled world, these concerns will start to appear frequently around voice AI technologies.
Voice AI has come a long way.
It opens an endless realm of opportunities through helping those with disabilities all the way to assisting people with mundane tasks.
Through its growth, however, comes some concerns for the future.