← back · transcript · lmPUhidVxDU · view dossier

Transcript

What will voice AI look like in the future? | Drew Ross | TEDxBoston

all right now before I begin I want everyone to stop for a second and close your eyes think back to the first words you ever spoke now if you're anything like me you don't remember them because you were a baby regardless that moment wasn't just adorable it marked the beginning of something extraordinary the ability to communicate from the moment we speak our first words voice is the way we interact with the world around us before we learn to write before we even know what symbols are we speak voice is the purest bridge between one human mind and another it's the highest bandwidth form of communication us humans have artificial intelligence too has begun to speak it's already uttered its first words yet it's still in its infancy The Voice AI of yesterday like Siri and Alexa for example were our first of talking to machines but they were built for a different era an era of rigid commands keyword recognition and scripted responses we've all had a moment talking to one of these assistants where it doesn't quite understand our question gives us a complete nonsense answer or Worse even goes I'm sorry I didn't quite get that these systems recognize words but they don't truly grasp meaning intent remotion these were our first steps in AI voice interaction but now we're stepping into a new era one where AI doesn't just hear words but it listens to them understands them and engages with them in recent years we've witnessed an explosive advance in text-based intelligence large language models have shown remarkable capabilities in understanding and generating human language at the same time Transformer and diffusion based models have revolutionized the way we can listen to AI giving them a more humanlike voice than ever before today we can hold real conversations with AI you can go to chat.com open up open AI realtime API and talk to it about anything from baking recipes to business ideas and even for emotional support it's really a testament to how far the speech technology has come in such a short time yet conversational AI still has plenty of struggles for one thing it for the most part lacks conversational intelligence things like understanding emotion recognizing intent and smoothly taking turns in dialogue are things us humans take for granted it's second nature to us but AI struggles with them another thing is latency with all the heavy computations being done under the hood behind the scenes to make these machines talk often times they can be slow to respond which kind of breaks up the quality and immersion of talking to these Ai and finally reliable agency sure you can talk to an AI today on your computer on your phone even but that doesn't really do much for the world if it's just speech it's not taking any real world action but here's the good news these problems are being solved as we speak new advances in audio and emotional embeddings enable AI voices to capture subtle cues in pitch and Rhythm adapting to a speaker's mood in real time Texas speech systems are being trained on synthetic speech data so they can improve their emotional ability at scale without the need for a bunch of human recordings to tie it all together Engineers are pouring hours and hours into developing low latency compute and robust Integrations that enable AI to take real world real time so that's what's going on today but what about tomorrow voice AI won't just be something you talk to it'll be something that anticipates needs and seamlessly executes tasks fundamentally transforming how we work and how we live imagine administrative voice assistance that can do anything from schedule meetings to take notes and even make some basic choices all from you just telling it what to do no typing no clicking or how about AI teachers who can adapt their tone and pacing to every individual students needs and can even filter down all the knowledge of the internet into a personalized plan for every single student what about voice-based programmers they let you develop entire products without ever having to lift your finger and touch a keyboard in some Industries this technology is already here take for instance the $300 billion Doll Market of call centers already every single day millions of people all around the US and world are picking up their phones and they're talking to Ai and these agents will not only hold conversation with you but after the conversation is over they'll go to the back end and autonomously do some actions for the company odds are you yourself have already spoken to an AI agent and you don't even know it and as this technology matures we're only going to see it crop up in every major industry but that's just the start in the future voice AI will be more than just virtual assistance it will become the fundamental way we interface with all technology like a personal Jarvis that's always there to help gone will be the days of keyboards and touch screens and clunky interfaces interacting with all powerful technology will be as natural as having a conversation for those who keep the world running like surgeons and First Responders and industrial workers it could mean hands-free access to critical information and voice controlled systems enabling safer faster and more precise decision- making in real time it will mean breaking down language barriers and all human technology interaction allowing people to command intelligent systems naturally in their native tongue and for people with disabilities voice first uis will open doors that were previously closed and enable unprecedented Independence so why does this matter to you and to me well voice first interfaces promise a human way of interacting with technology instead of adapting our Behavior to machines you know navigating screens tapping keyboards we can adapt machines Behavior to us and have them speak like we do in everyday life let's focus our energy on voice the human way of interacting action as Leaders we can push for great reliable and ethical voice AI as innovators we can create new solutions that harness voice intelligence and as Builders we can engineer robust platforms that bring voice first experiences to life at scale AI has already spoken its first words now it's up to us to nurture it from infancy into a truly impactful partner the future of human computer interaction will not be seen it will be heard will you be part of the conversation [Applause]