P u r e t a l k . a i

A guide to Puretalk.ai's RUTH NLU Model


Last updated October 01, 2020

Puretalk.ai’s RUTH (Really Understanding the Humans) comes equipped with the infrastructure necessary to help retailers personalize their products and services to their target customers. Our artificial intelligent system, RUTH, consists of four separate layers — a speech-to-text model (STT model), a conversational model, natural language processing (NLP), a voice synthesizer, and the connectivity piece. Each of these separate elements works hand in hand to process user utterance and generate desirable output. Within the STT model, there are text decoding, speech recognition, and task management. In a nutshell, our speech-to-text model accounts for processing and converting audio of user utterance or voice into strings of text. With the text decoding element, our system analyzes what is being said by the user to then produce an assignment, which is the task management component’s duty to process and generate a response for the conversational model.


There are a cluster of components that take the task generated by the SST model to fully grasp user utterances. The components are barge-in, active listening, turn taking, context processing, and our custom retrocausality algorithm. With the barge-in component, RUTH can process an external voice that has something relevant to the current conversation in motion or even interruptions such as “uhm, hold on, or wait a minute”. For instance, let’s say a conversation is in motion between User A and User B; User B asks the system for help finding pink ribbons, but User C interrupts and instead asks for pearl ribbons, the system will change course and produce search results for pearl ribbons because the word “ribbon” is a common factor within the user utterances.


Additionally, multiple people could be engaging in a conversation within a room resulting in murmur or background noise, the focus of the conversation however is between two people, User A and User B. RUTH, the ai, actively listens to volume and pitch to identify who are the dominant players — User A and B — and the ai ignores all background voices once context is established after a moment, which is a process made possible by active listening and turn-taking. Furthermore, RUTH can understand common jargon used to refer to items or places through the context processing component, which is one of its most unique features. Let’s say User B asks for the best bars in the “big apple”. The ai can understand that the “big apple” is in fact New York City and will enable our retro causality algorithm to facilitate a response that is conducive to User B’s original intent — finding bars in New York City. The conversational model goes together with the NLP component within Ruth, which comprises intent classification, sentiment analysis, multi-speaker differentiation, and the Puretalk codex. Intent must be fully understood to fully process a request, which is key in managing the flow of a conversation.


Additionally, multiple people could be engaging in a conversation within a room resulting in murmur or background noise, the focus of the conversation however is between two people, User A and User B. RUTH, the ai, actively listens to volume and pitch to identify who are the dominant players — User A and B — and the ai ignores all background voices once context is established after a moment, which is a process made possible by active listening and turn-taking. Furthermore, RUTH can understand common jargon used to refer to items or places through the context processing component, which is one of its most unique features. Let’s say User B asks for the best bars in the “big apple”. The ai can understand that the “big apple” is in fact New York City and will enable our retro causality algorithm to facilitate a response that is conducive to User B’s original intent — finding bars in New York City. The conversational model goes together with the NLP component within Ruth, which comprises intent classification, sentiment analysis, multi-speaker differentiation, and the Puretalk codex. Intent must be fully understood to fully process a request, which is key in managing the flow of a conversation.


RUTH’s speech-to-text model, text-to-speech model, and its wake word all reside with the voice synthesizer component, which is mainly responsible to identify when the system is being addressed. Words like “RUTH’’ or ‘’ hey RUTH” are known as wake words that will alert the system to communicate commands, and once voice is recognized, they are then translated through the STT model and TTS model respectively. STT is the interdisciplinary sub-field of computational linguistics that develops methodologies and technologies that enables the recognition and translation of spoken language into text by computers. And TTS is a form of speech synthesis that converts text into spoken voice output. Together, our STT and TTS models have a symbiotic relationship in processing user generated voices to compute desirable outputs.

Request A Demo Here.

See what a Puretalk.Ai agent can do for your business.

Already a member? Sign in.