Alexa, Google Assistant, and the Rise of Natural Language Processing

Alexa, Google Assistant, and the Rise of Natural Language Processing

Just days into the new year, Forrester Research analyst J.P. Gownder predicted that 2018 would be “the year of AI and conversational interfaces.” 

He was right. Although CES has come and gone, the annual show tends to set the tone for the year — and this year, the battle of the AI voice assistants took center stage, with Google coming out swinging for Amazon’s hold on the market. Alexa-enabled devices were everywhere (voice-enabled flushing, anyone?), but Google Assistant was impossible to ignore, both from the debut of its smart display partnerships as well as the “Hey Google” branding plastered all over Las Vegas.

As the battle continues to heat up for which virtual assistant should sit in your living room (and soon your cars and even sunglasses), we wanted to take a closer look at the AI that powers these particular conversational interfaces — and just why it holds so much possibility for the future of smart home technology.

“Alexa, what is Natural Language Processing?”

The core technology behind voice assistants is Natural Language Processing (NLP). If asked directly, as we did, Alexa will tell you that NLP is “the field of computer science, artificial intelligence, computational linguistics, big data, and data science concerned with the interactions between computers and human languages.” 

That’s a lot to take in, but luckily we have access to some in-house expertise on the subject here at Lighthouse. Let's break it down: NLP is a field of study that encompasses a lot of different moving parts, which culminates in the 10 or so seconds it takes to ask and receive an answer from Alexa. You can think of it as a process of roughly 3 stages: listening, understanding, and responding.

Listening: This is what happens as you ask the question, after signaling Alexa to pay attention by calling her name. Speech recognition software is how computers transcribe your spoken speech into English (or other language) text, which can then be sent for further processing. 

Understanding: This is the processing part, where the text is analyzed for meaning so it can be turned into data. Natural Language Understanding (NLU) is considered the hardest subfield of NLP because of all the varying and imprecise ways people speak, and how meanings change with context. It entails teaching computers to understand semantics with techniques like part-of-speech tagging and intent classification — how words make up phrases that convey ideas and meaning. 

Responding: Think of this as the inverse of NLU, where data is translated back into text. Among others, there are two key functions: One is how the system decides which data is most relevant to your query, and the other is how to put that responding concept into language that humans can understand. Once the natural-language response is generated, speech synthesis technology turns the text back into speech. This is how Alexa responds to you, in that now-familiar voice.

Artificial intelligence is central to NLP. More specifically, machine learning is an application of AI that teaches computers to learn and improve over time without being explicitly programmed, but rather from data and experience. It touches all subfields of NLP in different ways, but essentially the computer is trained to learn from all its “conversations” and adapt accordingly, which allows for increasingly natural interactions. 

Why NLP is a game-changer for smart technology

Natural Language Processing is not a new concept: People have been trying to crack the code of computer-human conversations since the 1950s, and facets of NLP have been in use long before AI voice assistants. However, it’s only recent developments in speech-to-text technology, coupled with machine learning techniques that have greatly advanced NLU, that have catapulted it to the forefront — and straight into our homes.

It’s clear that voice-activated NLP has already completely changed how we interact with our smart devices. And anyone who has asked Alexa (or Google, or Siri, or even poor Cortana) a question inherently understands the immense value-adds of such a functionality. They’re the same reasons we decided to incorporate NLP into our own product, here at Lighthouse: 

It’s seamless (and harnesses a lot of power).

Consider our example with Alexa above. There is so much going on — entire fields of computer science — in just that one interaction. But for the users, everything happens under the surface, and in seconds. The experience is seamless and therefore so easy.

What’s more, voice-controlled NLP acts as a natural gateway to other complex technologies while preserving that same easy experience. For example, at Lighthouse, we use 3D sensing and computer vision techniques to understand and categorize movement, so you can tell between adults, children, and pets. Our facial recognition learns to identify frequent visitors over time. 

All very useful features, but our NLP is really what ties it all together. We landed on NLP as the best interface for searching video, because it means you can find specific footage without having to fiddle with toggles and scrubbers. But the true beauty is how seamlessly it allows you to access the other aspects of our AI technology, like facial recognition and computer vision, just by asking if Anne came by today, or requesting to be notified if the kids aren’t home by 4pm. You don’t even have to think about it, which is what makes it so powerful.

It’s intuitive (which is the key to good tech).

Though we’ve established that there is nothing simple about NLP on the technology side, on the human side it’s as simple as asking. Unlike any other type of technology interface, there is virtually no learning curve. You ask; your device answers. Even if the computer didn’t understand you, you’re prompted to rephrase in a natural way. Above all else, interacting with NLP-enabled tech is intuitive.

This is important because it widens the scope of users, from people who are mobility-impaired to kids to the elderly, all of whom might be unfamiliar with or have difficulty using other interfaces. It’s also important because it shortens the time between getting a device and truly getting started. And that’s a hallmark of good technology — not only capability, but accessibility and usability. 

It’s delightful (and that’s when the magic happens).

Let’s be real: A big part of the appeal with voice assistants is the entertainment factor. There’s a certain novelty to asking Alexa silly questions or telling her to close the pod bay doors. By flipping the process and putting the burden on computers to communicate with us, instead of the other way around, NLP-enabled tech is a delight to use. This opens the door for fun, which in turn sparks creativity and innovation.

For us at Lighthouse, it inspired us to take the functionality of our device beyond a standard (and let’s face it, boring) security camera. Being able to differentiate between between adults, kids, and pets has serious security benefits. But being able check in on your kids or pets just by asking brings an element of fun, and lets you interact more with your household even while you’re away. For us, it changes the scope of possibility for what a smart home camera could be. 

For all these reasons, Natural Language Processing with voice assistants as its proxy has already redefined how we interact with technology, in the home and otherwise. Although momentum had been growing, Google and Amazon’s looming battle royale for market share crystallizes how important NLP-enabled technology is about to become. Which is all to say: The year of AI and conversational devices, indeed. And it's not even February. 
 

The road to Lighthouse (and what lies ahead)

The road to Lighthouse (and what lies ahead)

What is 3D sensing anyway?

What is 3D sensing anyway?