Wise words for brands from Project Voice

Mon, Jun 8, 2020

Read in 4 minutes

It's less what voice sounds like, and more what it delivers.

Wise words for brands from Project Voice

I attended a superb presentation on the value of voice assistance to brands this past January at Bradley Metrock’s top-shelf Project Voice event in Chattanooga.

The speaker: Tobias Dengel, CEO of WillowTree, Inc., the mobile/digital agency out of Charlottesville, Virginia, with a client list that includes General Electric, PepsiCo, Johnson & Johnson, Time Warner, Turner Broadcasting, Wyndham Hotel Group, 21st Century Fox, and Zappos.

One of the best conference break-out sessions I’ve attended in years.

Let me share with you some highlights from his talk at Project Voice—with a bit of editing and added opinion.

How do we know that voice assistance is not a tech-driven fad?

Will voice assistance—like the iPhone—change consumer behavior in ways we can’t yet envision?

Or, will voice be yet another tech-driven chunk-o-hype? Like 3D television—the darling of a CES not so long ago, but nowhere to be found in today’s US homes. Or, like AR-VR, which for years has forecast a nation in headsets.

In support of the former, Dengel pointed to data that shows that voice assistance to be the fastest-adopted consumer technology in history.

To the latter, he noted that, despite numerous predictions of voice enthusiasts, voice assistance has not yet significantly changed the world of brand-to-consumer connection.

But Dengel believes it will. And in a big way. But only if brands (and the voice development community brands hire) understand the key questions of consumer value creation.

The big issue for brands and voice is user trust.

There is, Dengel said, a trough of distrust that every new consumer technology must slog through.

It’s made up of two parts. The first is cognitive (or rational) trust that asks, “will this do what it promises?”

The second is an equal part of emotional trust that rises or falls according to the actual usage experience. Is it simple to turn on, to engage with, to get something valuable? Or, is it frustrating in its complexity? Or, is it so advanced, so beyond the norm, that it is a bit unworldly, even downright freaky?

For voice, Dengel’s experience with top-tier brands suggests that emotional trust is today’s big issue.

This suggests a significant downside for voice assistance that is ever-more human like, more predictive, and more independent. For most users—at least in the near-term—it will be simply be more freaky. Even scary.

And it will beg questions of always-on listening and intrusion and data use.

To build consumer trust (and thus, adoption), brands must focus less on technological development and much more on providing users with fast and accurate answers.

Will brands (and developers) pursue an ever-more human experience? Or, will brands (and developers) focus first on solving human problems?

Yes, the futurists and engineers may dream of a personal, AI-enabled digital twin, a Jeevesian voice-enabled virtual assistant. Or even of a human-like representation of a brand.

But that’s not why consumers are turning to voice.

Dengel emphasized that today’s and tomorrow’s consumer voice value proposition is all about utility and efficiency: accurate, informative and extremely rapid responses to voice-based queries.

In short: answer the questions. Accurately. Rapidly. And, in a brand-right tone.

Ignore at your peril the critical numbers of voice usage.

Dengel underlined—as he did at the Voice@CES show the week prior—that voice developers and voice-centric brands must understand the basic math of voice.

Here they are:

Broadly stated, users can speak at roughly 130 words per minute. Users can type, on average, about 40 words per minute. There’s a 3X advantage to voice. Which means voice is the efficient means for input.

On the other side of the interaction, however, there’s a different reality. We can listen at roughly the same speed we can talk—about 130 words per minute. However, we can read and understand at roughly 250 words per minute. There’s a 2X advantage to the visual presentation of content. Which means a screen is the efficient means for output.

Which means, think multi-modal. Think voice-enabled smartphone applications. Think of visually-enabled smart speakers.

Thanks, Tobias. Great brain food for creators of value in voice.