wechat_bg

2019-03-20 13:03:30

ZHEJIANG, East China — We all know the feeling. You receive a call from an unknown number, answer it, and bring your phone to your ear. On the other end, a recorded voice spews out an advertisement. You sigh and irritably hang up.

Like many of us, 35-year-old Yang was used to sales calls like that. But last year, he started receiving a different kind of call that left him more unnerved than annoyed. The voices on the other end of the line sounded like recordings, but they could respond flexibly to his questions in fluent Chinese.

Having worked in artificial intelligence for several years, Yang — who refused to give his full name, citing a potential conflict of interest — says he soon realized that these calls were being made by AI-driven voice bots designed to recognize, interpret, and respond to human speech in natural ways. Chinese voice bots have developed rapidly since 2016, around a year after the State Council — the country’s Cabinet — made AI a key part of “Made in China 2025,” its flagship plan to transition from a labor-intensive manufacturing economy to a high-tech, more service- and consumption-oriented one. But as Yang received more and more calls from uncannily human-sounding bots — sometimes a dozen a day — he began to wonder who was behind them, how they knew his phone number, and how they seemingly knew so much about him. “Some of them even knew my full name. The more I spoke to them, the more calls I got,” he says. “It’s disgusting.”

As Yang’s experience testifies, in China, smart voice technology — the “brains” behind voice bots — has already reached the point at which people struggle to distinguish it from human speakers. Developers frequently claim that bots can replace human call-center personnel, dramatically reducing costs for telemarketing companies and plugging gaps in China’s shrinking labor market. But experts question the technology’s maturity and ownership, while lawyers claim that the current applications of voice bots infringe China’s privacy laws.

Some of them even knew my full name. The more I spoke to them, the more calls I got.

In the last few years, Chinese smart voice technologies — particularly speech recognition and generation — have seen remarkably high levels of investment. While the global smart-voice industry grew at a rate of 30 percent between 2016 and 2017, in China the rate was about 70 percent, according to a November 2018 report from China News. Last year, the Chinese smart-voice market was worth an estimated 15.9 billion yuan ($2.3 billion); by comparison, the global market was worth $6.2 billion in 2017, according to Zion Market Research.

In China, many tech companies develop voice bots as small but lucrative additions to their broader tech portfolios. Some, however, have made bots a core part of their business models. One such company is Silicon Intelligence — a Nanjing-based voice bot developer that reportedly made 100 million yuan in gross revenue last year. The company’s flagship bot, Guiyu — which literally means “silicon tongue” — has versions in Mandarin, English, and Japanese. Silicon Intelligence mainly sells Guiyu’s technology to sales companies in packages costing an average of 10,000 yuan, according to the company. The sales companies, in turn, use Guiyu to interface with consumers — including by cold-calling them. To date, the company claims to have sold to 11,000 clients both in China and overseas — but its ambitions are even bigger.

“We’re basically creating a telephone version of Siri,” says Sima Huapeng, Silicon Intelligence’s founder, referring to Apple’s voice-activated virtual assistant. The compact 37-year-old meets Sixth Tone in a glass-walled communal workspace near Shanghai’s financial district and immediately starts waxing lyrical about the businessman who inspired him —Steve Jobs, the founder of Apple. Like Jobs, Sima started several software businesses in his early 20s, one of them while still in college. He founded Silicon Intelligence in 2017, “to inherit Jobs’ revolution in smart voice technology,” he says.

Moment Open/VCG

Moment Open/VCG

When asked about the technology behind Guiyu, Sima shoots out a string of technical jargon. First, when a conversation begins, Guiyu recognizes what a customer says and writes it out as text — this is called “automatic speech recognition,” or ASR. Then the bot comprehends the text by referring to a vast database of phrases and sentences — a process called “natural language understanding” — and crafts its response. Finally, in a mechanism known as “text to speech,” Guiyu converts its written response into a vocal utterance.

Sima insists that Guiyu’s communicative ability is already approaching a human level. “We’re aiming at passing the Turing test here,” he adds, referring to the test invented by British mathematician Alan Turing that determines a machine’s ability to exhibit intelligent behavior indistinguishable from that of a human. Although the flaws in the Turing test are well-documented, its back-and-forth interactions that mimic person-to-person conversation are still generally perceived as a stern test of a machine’s ability to communicate with humans. Sima claims that most customers fail to identify Guiyu as a robot within 10 rounds of a question-and-answer-style conversation.

We’re aiming at passing the Turing test here.

Other voice bot manufacturers make similarly bold claims. Wan Xi Intelligence, a Hangzhou-based company whose logo is a blue robot with an antenna and two ball-shaped eyes, states on its website that its flagship bot, Biling, can recognize more than 98 percent of Mandarin speech. By comparison, EdgeSpeechNets — an English-language bot co-developed by researchers at Canada’s University of Waterloo and tech startup DarwinAI whose voice-recognition technologies are widely regarded as world-leading — claimed to have a speech recognition rate of 97 percent last year.

Claims like those made by Wan Xi have led Chinese experts to accuse bot manufacturers of false advertising. Li Haoze, a program manager at Hithink Flush Information Network Co. Ltd., a stock-trading website that is also developing its own smart voice service, says that speech recognition rates at that level are impossible, at least for now. “I’ve never heard of a speech recognition rate higher than 95 percent. Normally for Mandarin, 85 percent is pretty good,” he says. “With Cantonese, you’re looking at about 50 percent recognition.”

Many Chinese voice bots still fail to convince customers, too. Several recipients of automated calls tell Sixth Tone that the bots often fail to respond to ostensibly simple questions about, say, today’s date. Others say that bots struggle with nonstandard utterances, such as dialectal or accented speech. “If I say a sentence in non-standard Mandarin, it can’t understand me,” complains Tan Jiuding, a 21-year-old college student from central China’s Hunan province who spoke to a bot in his local accent.

E+/VCG

E+/VCG

Experts have also questioned whether smaller bot developers actually own the technology they sell. Silicon Intelligence, for example, claims to own all the technology behind Guiyu, but observers argue that it’s virtually impossible for a company of just 500 staff members to develop the complex ASR technology the bot employs. “Top ASR developers command salaries of at least 1 million yuan a year, and [Silicon Intelligence] would need a lot of them,” says Chen, the pseudonym of a sales contractor at one of Hangzhou’s biggest voice-bot companies who asked for anonymity, since his company has not authorized him to speak with media. “Companies like them can’t afford that sort of outlay.”

The high costs of smart voice technology mean that larger tech companies like Baidu, iFlytek, and Alibaba tend to be the most competitive in China. Smaller companies, meanwhile, likely purchase basic speech-recognition services from tech giants and repackage them as part of their final products, Chen says. iFlytek, which is headquartered in eastern China’s Anhui province, estimates that by the end of last year, 920,000 developers had bought access to their speech recognition services, producing 4.7 billion daily interactions. The company does not measure the proportion of interactions produced during cold calls.

I’ve never heard of a speech recognition rate higher than 95 percent. Normally for Mandarin, 85 percent is pretty good.

Other bot companies, like Silicon Intelligence, claim to use deep learning, AI-powered technology that provides the bot with access to a vast database of human utterances and allows the technology to self-learn how to use them. But Chen says that this, too, is a gimmick: Because deep-learning technology is still expensive to produce, bot companies draw up a large, but more limited, flowchart of all the possible customer responses to what the bot says, design tailored answers for each, record humans saying them, and feed this information to the bots with instructions on how to respond.

Xing Guang, founder of Wan Xi Intelligence, admits to using a similar system. “It means the bot works [quite well], but it can’t think or judge things on its own terms,” he says.

Sima’s faith in the transformative power of voice bots is unwavering. Guiyu is around “25 times more efficient” than a human call-center operator, he claims. “It can make 1,000 calls each day, which is five times a human can make, and the cost is only one-fifth of hiring a human laborer,” he adds, on the assumption that the average Chinese call-center employee earns around 50,000 yuan a year and a bot usually costs around 10,000 yuan.

Silicon Intelligence’s issue with human personnel runs deeper than their perceived inefficiency, though. The company’s official website also characterizes them as irrational, emotionally unstable, and prone to making mistakes that could cause sales companies to lose customers. A bot, on the other hand, doesn’t need training, never complains about the work, and carries out its job efficiently and dispassionately, the company says. “Our bots have probably made billions of calls by now,” Sima smiles.

An incoming cold call. IC

An incoming cold call. IC

Once bot developers sell their technology to third-party sales companies, they rarely check to ensure it’s used legally and ethically. That’s a problem, because as cheap smart-voice technology floods the market, companies are increasingly using bots to bombard consumers with annoying, unsolicited calls. In December last year alone, China’s Ministry of Industry and Information Technology received 86,000 public complaints about “harassment calls” — the Chinese term for aggressive cold calling — three times the number over the same period in 2017. Several people tell Sixth Tone that, in recent months, they have received large numbers of automated calls selling everything from insurance to sex toys. Yang Shuchen, a 31-year-old who teaches construction and engineering and is not related to Yang the AI-industry insider, says he even receives calls from voice bots selling — of all things — other voice bots.

To combat the growing number of consumer complaints and better regulate China’s telemarketing industry, the Ministry of Industry and Information Technology issued a directive in November 2018 compelling internet companies to cooperate with authorities in a bid to reduce cold calling. However, Chinese law still lacks an official definition of “harassment calls,” meaning that the boundaries of acceptable versus unacceptable forms of telemarketing remain vague.

If a bot company uses the personal data they collected for any other purposes beyond those declared at the beginning, they are also infringing on consumers’ right to be informed.

While the government rushes to control the industry, some Chinese lawyers are voicing concerns that the way third-party telemarketers use current voice-bot technology violates the country’s privacy laws. Most bots are designed to record every phone call they make for reference purposes, yet customers are seldom informed of the practice — indeed, few are even aware that the caller is a bot in the first place. In January, Guangzhou-based media outlet Southern Weekly reported that many of the companies that buy voice-bot services combine them with web-scraping tools that illegally gather phone numbers and other personal data without the owner’s consent. Some even illegally purchase contact details from insurance or real estate companies and feed personal information to the bots with as many numbers as possible. On March 15 — World Consumer Rights Day — China Central Television (CCTV) ran an exposé on 11 voice-bot manufacturers that either sold customer phone-number databases as part of the package, or sold them random phone-number generators capable of evading call blockers. Several companies in the CCTV report contested the claims, saying that they were unaware of such practices.

Meng Jie, a lawyer specializing in privacy protection and internet security, says that telemarketing companies which use voice bots for such practices are in violation of Article 41 of China’s Cyber Security Law, which requires companies to obtain users’ consent at the start of the call, and the principle of minimal data collection. China’s 2017 national standards on personal information security — the Personal Information Security Specification — require companies that acquire personal data to use it only for purposes directly related to and necessary for their business functions. “If a bot company uses the personal data they collected for any other purposes beyond those declared at the beginning, they are also infringing on consumers’ right to be informed under the Consumer Law,” Meng says.

Visitors examine a robot during a popular science product expo in Shanghai, Aug. 27, 2018. Xing Yun/VCG

Visitors examine a robot during a popular science product expo in Shanghai, Aug. 27, 2018. Xing Yun/VCG

Sima says the blame for China’s cold-calling epidemic lies with the sales companies, not developers. Sima says that because voice bots can process a much greater volume of calls than humans, it is more likely that people will be called multiple times by different bots. In addition, he says, many companies outsource telemarketing services to third-party contractors that sometimes operate fraudulently, resulting in little control over how the technology is used. “Many of these companies just call numbers from 000 to 999,” he says. However, Sima agrees that the industry needs further regulation.

iFlytek says it began self-regulating its voice bot sales last November and planned to terminate relationships with all clients who used the bots for cold calling. Xing says that Wan Xi is moving in a similar direction. “We don’t sell to peer-to-peer lenders, suspicious health-product providers, and cosmetic-surgery companies,” Xing says. “We also limit the bots’ working hours from 9 a.m. to 8 p.m.” When asked about the impact of such limitations on Wan Xi’s income, Xing admits that the company has not turned a profit in the past year.

But for now, rules designed to combat cold calling are patchily enforced, and consumers are bearing the brunt. “If I say I want to buy stocks, the next day I get tons of calls selling me stocks,” says Yang, the exasperated AI expert. “If I then say I want to flee the country, they call me back offering emigration services.”

Editor: Matthew Walsh.

(Header image: Shi Yangkun and Ding Yining/Sixth Tone)