by Dot Cannon
For the Speak2Me, showtime had come.
It was Saturday, October first. Day 3 of Audio Engineering Society’s 141st International Convention was in full swing ,at the L.A. Convention Center. AES had started off its first-ever “Product Development Super Saturday”at 9 that morning.
The goal of this unique workshop: to design a new, killer audio product, named “Speak2Me”, from the ground up, in one day. Designed as a competitor to the Amazon Echo, Speak2Me would serve as an example, to illustrate the steps of the product-development process.
That morning, Super Saturday organizer Scott Leslie and a “dream team” of experts had led the audience through sessions on product management, user experience, acoustic and industrial design, and sound processing.
Leslie, who is Chief Architect and Innovation Strategist at PD Squared, said the concept for “Super Saturday” had started at AES’ 2015 convention in New York. As the show was ending, longtime audio engineer and AES contributor Steve Hutt approached him.
“He said,”Hey, Scott, why don’t you design a speaker in one day?” Leslie recalled. “As I walked out, I thought he was half-joking. But over the next few weeks, I thought that would be brilliant.”
And of course, Leslie aimed high. He envisioned Speak2Me as an upscale version of the top speaker product in the market: the Amazon Echo.
Now, “Super Saturday” was in full swing. As attendees returned from lunch, the development team gathered around a table.
They’d set out their 3D printed prototype shapes, where their audience could get a close-up look.
A mini “trade show” was about to take place–where everyone would see how this Speak2Me prototype worked.
“Hello, Speak2Me,” said VoiceBox Program manager Andrew Bleeker, starting the demonstration. The speaker responded with a “ting”, on registering his voice.
“Play Bela Fleck.”
Speak2Me responded with “Flight of the Cosmic Hippo”.
“Hello, Speak2Me. Can you play some Michael Jackson?”
Strains of “Billie Jean” filled the room.
Circling around the speaker, Bleeker interspersed music commands with personal-assistant questions.
“Hello, Speak2Me. Is it raining in Seattle?”
Speak2Me interrupted the music. “Yes. Conditions in Seattle are…”
After a weather report, the music resumed.
So did Bleeker’s commands.
“How about Boston?…Play jazz…What’s the stock price for Amazon?…Play Nine Inch Nails.,,What’s the population of Shanghai?…Play Rhianna.”
Now, the Speak2Me is unique, in that it was actually speaking back to Bleeker. As Scott Leslie explained, following the conference, Speak2Me was designed with a beam-forming microphone array. This allows it to sense where the user is, through his or her voice. Then, a beam-forming transducer array allows Speak2Me to actually speak back!
“This was…why I named it Speak2Me. (It’s) something no other product can do, and a testament to the talent and dedication of our team,” Leslie commented in his email.
Admittedly, the demo wasn’t perfect. On a few occasions, Bleeker had to repeat his “Hello, Speak2Me” several times before the speaker engaged. A couple of times, Speak2Me also replied, “I didn’t understand”. But the overall performance was still enough to wow the audience–especially on an unscripted moment.
“Hello, Speak2Me. Call Andrew Bleeker. Request from the audience, I thought I might as well go for it.”
Bleeker’s phone jingled, and he stepped away. “Hello? Oh, the demo’s going great, thanks for asking,” he said, to laughter and applause.
Minimum Phase LLC founder and Chief Engineer Mark Trainer then explained how he’d created Speak2Me’s audio system.
“We started off with defining the box size,” he explained. “The idea is to make (Speak2Me sound better than Amazon Echo) as much as possible. So we expanded the box volume…(and) improved (Speak2Me’s) bass.”
Trainer played an Amazon Echo unit, then the Speak2Me, for sound-quality comparison. “So some of the trade-offs we’re doing (with Speak2Me),…the first thing we choose is the size of volume,” he said. “The goal here (with Speak2Me) is to just to play louder, at the expense of slightly larger, and to have more bass.”
In addition, he said, Speak2Me was geared towards an omnidirectional presence. “So if you were to walk around, you’d get the same experience, no matter where you are around the product. (Our goal on Speak2Me, is to extend the bass, by half an octave.)”
Leslie said that Speak2Me differed from Alexa in the way users engage it through speech.
“What we’re using from VoiceBox, our partner here, is natural language, to talk to it like you would normally have a conversation,” Leslie said. “I attended an Alexa developer workshop a couple of weeks ago, and Alexa doesn’t do that. You have to write every possible version of a question.”
“Team, did we do it?” Leslie asked, at the end of the demo. Cheers greeted his question.
The effort behind “artificial intelligence”
After the “trade show” was over, the group was able to examine Speak2Me’s natural voice interface more in depth. Dan Carter (l), of Kirkland-based voice-application innovators VoiceBox, led the afternoon’s first session.
Carter, who is VoiceBox’s Vice-President of Engineering in their Home Products division, discussed the challenges involved in creating natural-language understanding.
“(A few years ago,) you’d create a set of rules and say, ‘these are the words that can be understood by my engine’. Now, they’re pretty much all statistical based, and so they train it on various models. So (there are three main elements): a pronunciation lexicon…that says, ‘these sounds make this word.’ (There are) five or six or ten different ways people may say a word. And then there’s the acoustic model, and the language model.”
The acoustic model, he said, was based on the user’s accent, acoustic environment of the room and the audio path of the device itself.
The way a device is initially trained, Carter explained, is by getting a recording, with a transcription.
“And we get thousands of hours of those, and we run it into a model in the current state of the art.”
The downside, he said, is that this process is time-consuming.
“I have a house my team runs at (University of Washington) with four different rooms set up…and we basically come in there and read scripts…We move around to make different acoustic patterns. (We) get as much variance as we can with different people, with different accents, speaking to the devices at different distances.”
People were also speaking to the devices from different rooms with different acoustic properties, he added. “And typically, to build a speech (recognition platform) from scratch, you want about two thousand hours to start with.”
“Does that have to be repeated for each language?” Leslie asked.
“Yes, unfortunately,” Carter replied.
Testing for “real-world” use
Product validation and testing were the topics of the next session, featuring test-equipment company Avermetrics’ Vice-President of Sales and Marketing Jonathan Novick (l) and founder Paul Messick (r).
Novick started off the session with a definition–and a differentiation. “What is validation?” he began. “(It’s) basically asking, does the design work? Is it doing the things that we want it to do?”
But, he said, product verification was more specific. “Does this particular unit work? Does it follow the design, is it doing what it’s supposed to do?”
“The first thing we really need to do is validate the feature set,” Novick continued. “…Is this right for the market?…Is the benefit noticeable to the consumer (since every feature you add usually has a cost involved)?”
“And this overlaps a bit with the development phase,” Messick said, “because obviously you’re not going to wait until you’re ready to go into production. Do these features actually add any value?”
After a prototype had been created, Novick said, an important pre-production consideration was implementation. “You know you want these features. Now you want to know how to implement them correctly, in each unit. There are design constraints. Does it meet (all of them)?”
…”And then you want to know, is it robust? I have this unit here, and it’s going into a home environment. I’m going to have to knock this thing over, a few times, because there may be kids in the house, or a Labrador retriever with a big, powerful tail. Anyone who’s had a Labrador knows that problem.”
Numerous other considerations came up, besides the obvious audio quality and intelligibility.
“We have to validate the non-audio features as well,” Novick said. “First and foremost is the user experience: do they like the product? If that doesn’t pass,it’s a no-go….Just the construction and fit of things. Does it fit together, does it rattle?”
“Too often (designers say), ‘We’ll let the factory fix that,'” Leslie commented.
Novick said testing environments were important. “You do need to use both (laboratory and ‘real-world’ testing) as part of the validation,” he said. “(Real-world testing) is great because it exposes products to unforeseen situations. You have this idea of what the use model is, and as soon as you put it into the end user’s hands, all that can change. ‘What, they’re using it in the bathroom, it’s getting wet?'”
Linking to the supply chain
For the day’s final session, team member Mike Klasco. President of Menlo Scientific LTD, had originally been scheduled for a presentation on “Sourcing and Supply Chain”.
However, Klasco, who had been working on the project from its start, became ill several days prior to “Super Saturday”.
Using the slides he had supplied, Novick and Messick also stepped in to talk about manufacturing, once the product was verified.
“What I have to say applies pretty much to any product that you’re going to be making…in Asia or even in the U.S.,” Novick said. “When you’re looking for a CM (contract manufacturer), you’re looking for a few things to decide, is this the one you want to pick.”
Almost the most important consideration, he said, was what sort of experience a particular manufacturer had with the specific type of products you’re making.
“If they’re making Bluetooth headsets and you’re making live sound speakers, it’s probably not a good match.”
Quantity, he added, was something else to keep in mind. “If you’re intending to make (ten thousand a month of something), that might be a different CM than is making things (in thousand-a-month batches). You have to realize, they’ll almost always say, ‘oh yes, we can do that. You need to make sure that they actually can.”
Messick told the audience that the least expensive manufacturer wasn’t always necessarily the best.
“A big mistake you see with people who are trying to move products to Asia, is to…say, ‘Let’s go find somebody that can make it for a nickel,” he explained. “You will have a very long road,…because you will essentially reset your design and get to start over.
“Working with a CM,…my experience has always been, you send them some drawings, they’ll make you a sample, you look at this and say, ‘what were you thinking?'” Messick continued. “And then you’ll send back notes, and they’ll make another sample…and then after a few times around, it will be, ‘this is our product.'”
Trustworthiness, he said, was essential when working with a contract manufacturer.
“It does happen, that you have a design that you’re having made overseas, and you start to see it someplace else,” he commented. “There are some things that you’ll want to do…I know someone who used to make tubes, who would never have any one (manufacturer) have all the parts. So some of it was made in one factory, some in another, and they were shipped separately to a third, and so on.”
“But the more important way to handle the trustworthiness is to pick a (manufacturer) that either you’ve worked with over time, or that (has worked with people you know well),” Messick commented. “The most important thing in working with a CM is that they’re all based on relationships…Over time you learn to trust them.”
AES’ first “Product Development Super Saturday” was winding down. Scott Leslie and his “Dream Team” had provided a lot of useful information for makers and startups about the product-design process, through the hypothetical “killer audio product”, Speak2Me.
At the close of the day, Leslie called his “Dream Team” up to gather onstage, during his closing remarks.
“When I called these guys, in May and June,…and said, ‘here’s my idea, what do you think?’, the most amazing thing happened,” he commented. “They all said ‘yes’. And I thought, boy, I’m in trouble now…They’re all just rock stars in what they do. Great experience, we have people that have done so many different kinds of things, including not just in audio, and in all kinds of audio, and that’s what’s important here.”
In a phone conversation shortly after the 2016 AES Convention ended, Leslie commented on how much he’d enjoyed having the audience “wrap around” Super Saturday.
“If you want to have a successful, out-of-the-box session, make sure your audience is there to support it,” he explained. “I didn’t want them to (hear the idea and) say, ‘You can’t do that, I’ve been an engineer for years and years (and it’s not possible).'”
So, after Speak2Me’s premiere in AES’ “Product Development Super Saturday”, what’s next?
Leslie didn’t have an answer for that one–yet. But we’re almost certain to see a very imaginative Version 2.0.