Alexa, Cortana, and Siri aren’t novelties anymore. They’re our terrifyingly convenient future.
Adam Rifkin stashed this in Chatbots
Stashed in: Microsoft, Awesome, Siri, Turing, Artificial Intelligence, Echo
Alexa and software agents like it will be the prisms through which we interact with the online world.
In the beginning, computers spoke only computer language, and a human seeking to interact with one was compelled to do the same. First came punch cards, then typed commands such as run, print, and dir. The 1980s brought the mouse click and the graphical user interface to the masses; the 2000s, touch screens; the 2010s, gesture control and voice. It has all been leading, gradually and imperceptibly, to a world in which we no longer have to speak computer language, because computers will speak human language—not perfectly, but well enough to get by.
We aren’t there yet. But we’re closer than most people realize. And the implications—many of them exciting, some of them ominous—will be tremendous.
Like card catalogs and AOL-style portals before it, Web search will begin to fade from prominence, and with it the dominance of browsers and search engines. Mobile apps as we know them—icons on a home screen that you tap to open—will start to do the same. In their place will rise an array of virtual assistants, bots, and software agents that act more and more like people: not only answering our queries, but acting as our proxies, accomplishing tasks for us, and asking questions of us in return.
...
This is already beginning to happen—and it isn’t just Siri or Alexa. As of April, all five of the world’s dominant technology companies are vying to be the Google of the conversation age. Whoever wins has a chance to get to know us more intimately than any company or machine has before—and to exert even more influence over our choices, purchases, and reading habits than they already do.
So say goodbye to Web browsers and mobile home screens as our default portals to the Internet. And say hello to the new wave of intelligent assistants, virtual agents, and software bots that are rising to take their place.
No, really, say “hello” to them. Apple’s Siri, Google’s mobile search app, Amazon’s Alexa, Microsoft’s Cortana, and Facebook’s M, to name just five of the most notable, are diverse in their approaches, capabilities, and underlying technologies. But, with one exception, they’ve all been programmed to respond to basic salutations in one way or another, and it’s a good way to start to get a sense of their respective mannerisms. You might even be tempted to say they have different personalities.
You might notice that most of these virtual assistants have female-sounding names and voices.
Facebook M doesn’t have a voice—it’s text-only—but it was initially rumored to be called Moneypenny, a reference to a secretary from the James Bond franchise. And even Google’s voice is female by default. This is, to some extent, a reflection of societal sexism. But these bots’ apparent embrace of gender also highlights their aspiration to be anthropomorphized: They want—that is, the engineers that build them want—to interact with you like a person, not a machine. It seems to be working: Already people tend to refer to Siri, Alexa, and Cortana as “she,” not “it.”
That Silicon Valley’s largest tech companies have effectively humanized their software in this way, with little fanfare and scant resistance, represents a coup of sorts. Once we perceive a virtual assistant as human, or at least humanoid, it becomes an entity with which we can establish humanlike relations. We can like it, banter with it, even turn to it for companionship when we’re lonely. When it errs or betrays us, we can get angry with it and, ultimately, forgive it. What’s most important, from the perspective of the companies behind this technology, is that we trust it.
If Siri gave us a glimpse of what is possible, it also inadvertently taught us about what wasn’t yet.
Siri wasn’t the first digital voice assistant when Apple introduced it in 2011, and it may not have been the best. But it was the first to show us what might be possible: a computer that you talk to like a person, that talks back, and that attempts to do what you ask of it without requiring any further action on your part. Adam Cheyer, co-founder of the startup that built Siri and sold it to Apple in 2010, has said he initially conceived of it not as a search engine, but as a “do engine.”
If Siri gave us a glimpse of what is possible, it also inadvertently taught us about what wasn’t yet. At first, it often struggled to understand you, especially if you spoke into your iPhone with an accent, and it routinely blundered attempts to carry out your will. Its quick-witted rejoinders to select queries (“Siri, talk dirty to me”) raised expectations for its intelligence that were promptly dashed once you asked it something it hadn’t been hard-coded to answer. Its store of knowledge proved trivial compared with the vast information readily available via Google search. Siri was as much an inspiration as a disappointment.
Five years later, Siri has gotten smarter, if perhaps less so than one might have hoped. More importantly, the technology underlying it has drastically improved, fueled by a boom in the computer science subfield of machine learning. That has led to sharp improvements in speech recognition and natural language understanding, two separate but related technologies that are crucial to voice assistants.
If a revolution in technology has made intelligent virtual assistants possible, what has made them inevitable is a revolution in our relationship to technology. Computers began as tools of business and research, designed to automate tasks such as math and information retrieval. Today they’re tools of personal communication, connecting us not only to information but to one another. They’re also beginning to connect us to all the other technologies in our lives: Your smartphone can turn on your lights, start your car, activate your home security system, and withdraw money from your bank. As computers have grown deeply personal, our relationship with them has changed. And yet the way they interact with us hasn’t quite caught up.
“It’s always been sort of appalling to me that you now have a supercomputer in your pocket, yet you have to learn to use it,” says Alan Packer, head of language technology at Facebook. “It seems actually like a failure on the part of our industry that software is hard to use.”
Packer is one of the people trying to change that. As a software developer at Microsoft, he helped to build Cortana. After it launched, he found his skills in heavy demand, especially among the two tech giants that hadn’t yet developed voice assistants of their own. One Thursday morning in December 2014, Packer was on the verge of accepting a top job at Amazon—“You would not be surprised at which team I was about to join,” he says—when Facebook called and offered to fly him to Menlo Park, California, for an interview the next day. He had an inkling of what Amazon was working on, but he had no idea why Facebook might be interested in someone with his skill set.
As it turned out, Facebook wanted Packer for much the same purpose that Microsoft and Amazon did: to help it build software that could make sense of what its users were saying and generate intelligent responses. Facebook may not have a device like the Echo or an operating system like Windows, but its own platforms are full of billions of people communicating with one another every day. If Facebook can better understand what they’re saying, it can further hone its News Feed and advertising algorithms, among other applications. More creatively, Facebook has begun to use language understanding to build artificial intelligence into its Messenger app. Now, if you’re messaging with a friend and mention sharing an Uber, a software agent within Messenger can jump in and order it for you while you continue your conversation.
In short, Packer says, Facebook is working on language understanding because Facebook is a technology company—and that’s where technology is headed. As if to underscore that point, Packer’s former employer this year headlined its annual developer conference by announcing plans to turn Cortana into a portal for conversational bots and integrate it into Skype, Outlook, and other popular applications. Microsoft CEO Satya Nadella predicted that bots will be the Internet’s next major platform, overtaking mobile apps the same way they eclipsed desktop computing.
Siri may not have been very practical, but people immediately grasped what it was.
With Amazon’s Echo, the second major tech gadget to put a voice interface front and center, it was the other way around. The company surprised the industry and baffled the public when it released a device in November 2014 that looked and acted like a speaker—except that it didn’t connect to anything except a power outlet, and the only buttons were for power and mute. You control the Echo solely by voice, and if you ask it questions, it talks back. It was like Amazon had decided to put Siri in a black cylinder and sell it for $179. Except Alexa, the virtual intelligence software that powers the Echo, was far more limited than Siri in its capabilities. Who, reviewers wondered, would buy such a bizarre novelty gadget?
That question has faded as Amazon has gradually upgraded and refined the Alexa software, and the five-star Amazon reviews have since poured in. In the New York Times, Farhad Manjoo recently followed up his tepid initial review with an all-out rave: The Echo “brims with profound possibility,” he wrote. Amazon has not disclosed sales figures, but the Echo ranks as the third-best-selling gadget in its electronics section. Alexa may not be as versatile as Siri—yet—but it turned out to have a distinct advantage: a sense of purpose, and of its own limitations. Whereas Apple implicitly invites iPhone users to ask Siri anything, Amazon ships the Echo with a little cheat sheet of basic queries that it knows how to respond to: “Alexa, what’s the weather?” “Alexa, set a timer for 45 minutes.” “Alexa, what’s in the news?”
The cheat sheet’s effect is to lower expectations to a level that even a relatively simplistic artificial intelligence can plausibly meet on a regular basis. That’s by design, says Greg Hart, Amazon’s vice president in charge of Echo and Alexa. Building a voice assistant that can respond to every possible query is “a really hard problem,” he says. “People can get really turned off if they have an experience that’s subpar or frustrating.” So the company began by picking specific tasks that Alexa could handle with aplomb and communicating those clearly to customers.
At launch, the Echo had just 12 core capabilities. That list has grown steadily as the company has augmented Alexa’s intelligence and added integrations with new services, such as Google Calendar, Yelp reviews, Pandora streaming radio, and even Domino’s delivery. The Echo is also becoming a hub for connected home appliances: “ ‘Alexa, turn on the living room lights’ never fails to delight people,” Hart says.
When you ask Alexa a question it can’t answer or say something it can’t quite understand, it fesses up: “Sorry, I don’t know the answer to that question.” That makes it all the more charming when you test its knowledge or capabilities and it surprises you by replying confidently and correctly. “Alexa, what’s a kinkajou?” I asked on a whim one evening, glancing up from my laptop while reading a news story about an elderly Florida woman who woke up one day with a kinkajou on her chest. Alexa didn’t hesitate: “A kinkajou is a rainforest mammal of the family Procyonidae … ” Alexa then proceeded to list a number of other Procyonidae to which the kinkajou is closely related. “Alexa, that’s enough,” I said after a few moments, genuinely impressed. “Thank you,” I added.
“You’re welcome,” Alexa replied, and I thought for a moment that she—it—sounded pleased.
As delightful as it can seem, the Echo’s magic comes with some unusual downsides. In order to respond every time you say “Alexa,” it has to be listening for the word at all times. Amazon says it only stores the commands that you say after you’ve said the word Alexa and discards the rest. Even so, the enormous amount of processing required to listen for a wake word 24/7 is reflected in the Echo’s biggest limitation: It only works when it’s plugged into a power outlet. (Amazon’s newest smart speakers, the Echo Dot and the Tap, are more mobile, but one sacrifices the speaker and the other the ability to respond at any time.)
Even if you trust Amazon to rigorously protect and delete all of your personal conversations from its servers—as it promises it will if you ask it to—Alexa’s anthropomorphic characteristics make it hard to shake the occasional sense that it’s eavesdropping on you, Big Brother–style. I was alone in my kitchen one day, unabashedly belting out the Fats Domino song “Blueberry Hill” as I did the dishes, when it struck me that I wasn’t alone after all. Alexa was listening—not judging, surely, but listening all the same. Sheepishly, I stopped singing.
The notion that the Echo is “creepy” or “spying on us” might be the most common criticism of the device so far. But there’s a more fundamental problem. It’s one that is likely to haunt voice assistants, and those who rely on them, as the technology evolves and bores it way more deeply into our lives.
The problem is that conversational interfaces don’t lend themselves to the sort of open flow of information we’ve become accustomed to in the Google era. By necessity they limit our choices—because their function is to make choices on our behalf.
A world of conversational machines is one in which we treat software like humans, letting them deeper into our lives and confiding in them more than ever.
But if we won’t see a true full-service A.I. in our lifetime, we might yet witness the rise of a system that can approximate some of its capabilities—comprising not a single, humanlike Her, but a million tiny hims carrying out small, discrete tasks handily. In January, the Verge’s Casey Newton made a compelling argument that our technological future will be filled not with websites, apps, or even voice assistants, but with conversational messaging bots. Like voice assistants, these bots rely on natural language understanding to carry on conversations with us. But they will do so via the medium that has come to dominate online interpersonal interaction, especially among the young people who are the heaviest users of mobile devices: text messaging. For example, Newton points to “Lunch Bot,” a relatively simple agent that lived in the wildly popular workplace chat program Slack and existed for a single, highly specialized purpose: to recommend the best place for employees to order their lunch from on a given day. It soon grew into a venture-backed company called Howdy.
I have a bot in my own life that serves a similarly specialized yet important role. While researching this story, I ran across a company called X.ai whose mission is to build the ultimate virtual scheduling assistant. It’s called Amy Ingram, and if its initials don’t tip you off, you might interact with it several times before realizing it’s not a person. (Unlike some other intelligent assistant companies, X.ai gives you the option to choose a male name for your assistant instead: Mine is Andrew Ingram.) Though it’s backed by some impressive natural language tech, X.ai’s bot does not attempt to be a know-it-all or do-it-all; it doesn’t tell jokes, and you wouldn’t want to date him. It asks for access to just one thing—your calendar. And it communicates solely by email. Just cc it on any thread in which you’re trying to schedule a meeting or appointment, and it will automatically step in and take over the back-and-forth involved in nailing down a time and place. Once it has agreed on a time with whomever you’re meeting—or, perhaps, with his or her own assistant, whether human or virtual—it will put all the relevant details on your calendar. Have your A.I. cc my A.I.
11:15 PM Oct 26 2016