How COVID-19 Exposed the Little Troubles With China’s Big Data Push

How COVID-19 Exposed the Little Troubles With China’s Big Data Push

China has spent a decade investing in big data. But despite some notable successes, the coronavirus showed the industry still isn’t ready for prime time.

By Li Chunsheng

May 11, 2020#technology #Coronavirus #privacy

Over the past decade, China has invested heavily in the emerging field of big data. Massive data centers have been built in remote interior provinces like Guizhou and Gansu, and the national government has outlined plans for a national patient information platform. Officials view these investments as part of a broader program to improve social management, modernize governance, and build a smarter society by leveraging the vast quantities of information generated by our digital society. By 2018, the market for such services was worth an estimated 600 billion yuan ($93 billion).

It should come as no surprise that China would try to get the most out of these investments in its battle against COVID-19. The most prominent example has probably been the nation’s various “health code” initiatives, in which local governments partnered with tech giants Alibaba or Tencent to assign residents a risk rating based on their personal information, cellphone geolocation data, and up-to-the minute tracking information on confirmed cases and clusters, among other factors.

But big data has also been used by some local governments to provide residents confined to their homes with online services, as well as to allocate and monitor medical supplies and protective gear. Following the initial outbreak, many neighborhoods and medical institutions turned to cloud computing platforms to collect and record information about the source, quantity, and kinds of goods at their disposal, for example.

Given the high cost of big data infrastructure, however, it’s worth asking how much the above programs really contributed to the country’s epidemic prevention efforts.

The early returns seem underwhelming. Wuhan, the epicenter of the COVID-19 outbreak in China, had previously positioned itself on the cutting edge of the big data industry. But in the early days of its fight against the coronavirus, the city still found itself reliant on human labor — neighborhood workers, volunteers, and other personnel — to allocate supplies, screen residents, and track patients.

One obstacle to actually using big data in meaningful ways is the continued existence of “information islands,” large agglomerations of data not shared across broader networks. Information collection is costly and time-consuming, and there are few incentives for local governments to share their data with each other — or even for their various departments to do so.

These islands hamper coordination. For instance, health codes still aren’t recognized across regional lines. Wuhan residents might have a green code issued by their local health authorities, indicating they’re healthy and able to travel freely. But if they journey to other areas, they must go through the verification process again.

Technical problems and errors have also played a role. For big data to be used effectively, it must be accurate, and its accuracy hinges on the scope and scale of the data collected. Missing or unavailable data can significantly reduce the accuracy of algorithms and models. Some people who have not been to Hubei for a long time were nevertheless flagged by color code algorithms as having been there in the previous 14 days.

Inconsistencies in technical and platform standards also limit big data’s utility while increasing the workload of grassroots level operatives. During the height of China’s epidemic, prevention workers had to collect data and report duplicate information to a multitude of platforms and databases.

And then there is the problem of leaks. For big data-reliant tracking and screening programs to function, they need to collect people’s location information, health status, recent activity, and other information. Yet many poorly designed platforms are prone to massive data breaches. After patient and personal data leaked on messaging platform WeChat, people who had once lived in Hubei became the target of abuse and discrimination, even if they hadn’t been back in years.

The truth is, for all big data’s seeming accuracy, it offers nothing more than a probability estimate and is subject to errors or bias, just like anything else. If the past few months are any indication, getting the most out of the technology will require significant troubleshooting.

First, China needs a national-level system of laws, rules, and safeguards for big data and related technologies. This would embed big data into the national governance system while ensuring ample, detailed system safeguards for its use in specific situations or areas.

Second, big data and associated technologies are advancing all the time, but governments and officials often dawdle when it comes to learning, accepting, and operating new technology. Many local government departments sought to provide digital services to constituents over the past few months, but there are still plenty whose websites are all but defunct. To ensure its data resources can be used fully and effectively, governments need to better train their workers and increase the number of those with big data expertise.

Finally, the effective use of big data requires high-quality input. We need to reform information systems, break down information barriers between local governments and their agencies, and build standardized information platforms. China’s public security apparatus has access to population and mobility data, while health officials have access to citizen’s medical information. If they had shared that data in a timely fashion, the government could have accurately identified the movements of suspected virus carriers and carried out prevention measures in real time.

China has invested heavily in big data, but the COVID-19 pandemic shows us how far we still have to go if those investments are to pay off. Rather than be discouraged by its underwhelming performance during the COVID-19 pandemic, we should focus on fixing the problems the outbreak has exposed.

Translator: Katherine Tse; editors: Lu Hua and Kilian O’Donnell; portrait artist: Wang Zhenhao.

(Header image: A big data urban management interface on display in Hangzhou, Zhejiang province, April 29, 2020. Long Wei/People Visual)