Are you familiar with the quote from Stephen Hawking about human speech?
“For millions of years mankind lived just like the animals. Then something happened which unleashed the power of our imagination. We learned to talk.”
These wise words are also the prelude in one of my favorite “Pink Floyd” songs, “Keep Talking”.
The quote couldn’t be any more accurate, and since long we have been dreaming of extending our conversations onto machines. I guess I’m not surprising you with the news that this day has finally come, or actually that this day has passed quite a few months – questionably even years – ago. Yes, we can talk to machines!
The wonderful advances in Artificial Intelligence (AI), and more specifically the Natural Language Understanding (NLU) or Processing (NLP) technology have become mature enough to support our ambition.
In this article, the goal is simply to share my disappointment with many chatbot implementations. I’m not expecting a pitch perfect tone, complex conversations or the ability to accurately talk about the meaning of life. The natural language technology is still in its infancy, and we need to learn to crawl before we start walking (not to mention running). The focus of this article is on a very small frustration; the time delay – or lack thereof – between sending a question, and receiving a response.
You are more than welcome to challenge my authority on this subject, and feel free to completely disagree. I’ve gained this insight by designing, building and maintaining a chatbot for youngsters: K’Ching. This implementation has received over 150.000 questions to date, with conversations lasting up to 20 minutes.
The evolution to the voice interface, and beyond…
The user interface has moved from the “Command Line Interface” (or “CLI”), to the “Graphical User Interface” (or “GUI”) onto the “Natural User Interface” (or “NUI”).
The Natural User Interface (NUI) has been around for quite some time, but only really kicked into mainstream when we started to use the touchscreen on our smartphones. As a user, it has become very natural to manipulate our electronic devices using single- or multi-touch gestures.
The ability “to touch”, is really only the first natural interfaces. The ability “to speak” highlights a second natural interface and marks a glimpse of the upcoming innovation cycle; a transition from a Mobile-First to an AI-First world. I personally believe that in five years, any technology will be useless if you can’t have a conversation with it.
Note: In another blog post, I’ve described the imminent (r)evolution of the user interface, where I explain that the next frontier of the Natural User Interface will be “to see”, and ultimately – as man merges more and more with technology – “to think”.
The latency requirement for chat conversations with machines…
In order to truly speak to electronic devices, we are faced with several distinct challenges: the ability to understand language, and then the capability to translate speech into text and vice versa.
This article is focused on a particular issue with “chatbots“; computer programs that can conduct a conversation using natural language. This technology uses a textual of “chat” interface for a dialog between man and machine.
A chatbot is typically exposed via a “conversational interface“. The implementations vary from integrations in websites, mobile apps or onto popular messaging platforms like Messenger, Telegram and soon probably even WhatsApp.
The most important business benefits are an improved – more natural or more familiar – user experience, and an increase in productivity. The (early) top use cases include customer advice and recommendations, customer service and support, or even conversational commerce.
In chatbots, the technical term “latency” refers to the time delay between receiving a question and providing an answer. You have probably experienced chatbots where you ask a simple question, and within a few millisecond you receive a multi-sentence reply. This feels like receiving an instant out-of-office message. The reply is provided too quickly, and this makes your chat feel too much like you’re talking to a machine.
It’s important that a chatbot implementation respects a typing delay when talking to a human being. The average person types between 38 and 40 words per minute, or between 190 and 200 characters per minute. This slight delay makes (or breaks) the correct experience when talking to a chatbot.
In addition, in a chat interface we typically answer one sentence at a time. The goal should not be to fool your users (and make them believe they are talking to a human being), but to respect the “expected” experience when using a chat interface.
Step 1. We receiving a question – or a command – from a youngster: “Tell me a joke!”
Step 2. We mark a first time-stamp (for example “07:08:29,00”) and immediately start a typing indicator. The youngsters immediately get the impression that the chatbot is started typing an answer.
Step 3. The question is now being processed; more precisely we first correct any spelling mistakes (using Microsoft Azure Bing Spell Check API), and then send the corrected question off to our Natural Language Processing engine (using IBM Watson Conversation).
Update: The latest version of IBM Watson Conversation includes a “Fuzzy Matching” feature; improving the ability of the service to recognize user input terms with syntax that is similar to the entity (but without requiring an exact match). As a consequence, we are considering removing the Microsoft Azure Bing Spell Check API service.
Step 4. We mark a second time-stamp (for example “07:08:30,10”) as soon we receive the answer from the IBM Watson Conversation service: “A robot walks into a bar. “What can I get you?” the bartender asks. “I need something to loosen up,” the robot replies. So the bartender serves him a screwdriver.”
5. As this reply consists of multiple phrases, we chop the reply now in distinct sentences. If you analyse the chat conversations of youngsters, you will see that youngsters tend to answer one sentence at a time (and not – like many older people do – using a chat medium like an email service).
- A robot walks into a bar.
- “What can I get you?” the bartender asks.
- “I need something to loosen up,” the robot replies.
- So the bartender serves him a screwdriver.
The first sentence of the reply “A robot walks into a bar.” is 25 characters long. In case of a human, it would take 7 seconds (25 characters * 0,3 seconds) to create this reply. This is obviously a bit slow. We allow our bot to type at 0,1 seconds per character; or 2,5 seconds for this first sentence. As this is our first sentence, we still have to subtract the delay in receiving the reply: 1,1 seconds (07:08:30,10 – 07:08:29,00).
We reply with the first sentence (after 1,1 seconds + 1,4 seconds), and then again show the typing indicator.
Step 6. We delay the reply of second sentence by 4,1 seconds (41 characters * 0,1 seconds), and then again show the typing indicator.
Step X. We repeat this last step for each sentence. After replying the last sentence, we obviously don’t display the typing indicator anymore.
Addendum: If a reply sentence is longer than 50 characters, we have learned to limit the maximum delay to 5 seconds. It’s our intention to make our user feel as if the machine is typing an answer, and not the goal to make this a waiting game.
Enjoy chatbots in the slow lane…
This will probably be one of the only projects without the requirement for a quick and snappy service. In technical terms, you are making this a “high latency” implementation on purpose. You should however notice that this trick makes a big difference; it provides your users a more engaging experience when chatting with your chatbot.
I mentioned earlier that this issue is focused on solutions that uses a textual or “chat” interface. A virtual assistant that uses an auditory interface – like Amazon Echo with Alexa, Google Home or Siri – doesn’t have this issue. In speech, the spoken words are automatically causing the delay.
“Speech has allowed the communication of ideas, enabling human beings to work together to build the impossible. Mankind’s greatest achievements have come about by talking, and its greatest failures by not talking.”
And that’s exactly why I’m writing these words, and sharing this experience with you.