TOPICS 

    Subscribe to our newsletter

     By signing up, you agree to our Terms Of Use.

    FOLLOW US

    • About Us
    • |
    • Contribute
    • |
    • Contact Us
    • |
    • Sitemap
    封面
    VOICES & OPINION

    If AI’s Native Language Is English, Where Does That Leave China?

    Because most high-quality texts are written in English, AI models perform best in that language. Can languages like Chinese ever catch up?

    A Chinese business using artificial intelligence to improve its operations, speed up its research, or simply cut costs, has a wide variety of AI models to choose from. There are plenty of domestic options, including DeepSeek, Qwen, or Kimi.

    But many, especially those headquartered abroad, still opt to use the industry-leading offerings out of the U.S., such as OpenAI’s ChatGPT or Google’s Gemini, to tap their global knowledge foundations or coding capabilities. These can be used in a wide variety of languages — English, of course, but also Chinese and many others.

    This is one of the reasons many at Silicon Valley AI companies claim their AI products are helping improve productivity around the world. But, my research shows, there is an important caveat. AI tools likely have a native tongue, and those who don’t speak it are being left behind in ways that are subtle and consequential.

    At the time of my experiment, in early 2024, ChatGPT-4 was the world’s most popular large-language model (LLM), with close to half of the world’s market share and customers in more than 100 countries. I tasked it with generating business recommendations in English, Chinese, and Arabic, and then asked 480 people to use those AI-generated recommendations to write professional emails themselves.

    There were four categories of recommendations: marketing (e.g., design a marketing campaign for a new tea brand), customer service (resolve disputes in customer returns at an electronics store), human resources (boost employee morale after stagnant sales), and R&D (give suggestions for developing a new shock-absorbent bicycle helmet).

    The results were stark: AI-generated content in Arabic and Chinese was consistently rated as significantly less complete and less relevant than the English versions. The differences aren’t just a matter of poor grammar or shorter output. They also manifest in “actionability” — how easily a piece of AI-generated advice can provide concrete next steps toward an applicable solution – as well as creativity — the likelihood that AI models generate out-of-the-box ideas. When workers in Arabic and Chinese settings used ChatGPT-4, their work output became significantly less actionable and less creative compared to that of their English-speaking counterparts.

    Another notable finding appeared when I looked at the R&D tasks. While ChatGPT-4 was decent at helping users write marketing and customer service emails in all languages, its quality and the quality of participants’ AI-assisted emails in Arabic and Chinese was especially abysmal when the tasks became technical, such as resolving issues with new product designs or scientific discovery. For example, when asked to develop ideas for a new shock-absorbent bicycle helmet in English, the model offered detailed information on the latest technologies and provided several feasible next steps, whereas in Chinese and Arabic, it produced vague information about the technological landscape and did not offer feasible options for the fictitious company to look into.

    These results point to a productivity gap: for similar tasks, if an AI tool creates actionable and creative plans for English speakers and mediocre ones for speakers of other languages, the former might derive a competitive advantage. This gap is likely to be amplified for tasks involving technical knowledge and scientific discovery.

    The reasons are likely rooted in the existing language inequalities of our world. LLMs are mostly trained on massive datasets of texts gathered from across the internet. Online, 55% of all domains use English as their default language, a discrepancy in part due to the internet having been developed in the U.S.

    This gap is amplified in the training of LLMs because the vast majority of the most essential training data — research papers, technical manuals — has adopted English as its lingua franca. Case in point: up to 90% of the world’s scientific papers were written in English. Even models developed by Chinese companies are often heavily trained on English content to help them perform better on global benchmarks.

    Many non-English languages are often what researchers call low-resource (e.g., Swahili) or medium-resource (e.g., Vietnamese) languages. This means there might not be enough diverse and high-quality digital texts — such as scientific papers, business case studies, or legal documents — available in those languages to train AI models on.

    It is important to highlight the finding that the penalties in creativity and actionability are especially severe for more technical, R&D-related tasks. Think about what this means for a global economy: if a startup in Beijing is using ChatGPT in Chinese to help design a new medical device, they might be working with a tool that is fundamentally less capable than the one used by a startup in London using English. This doesn’t just slow down individual workers; it potentially slows down the scientific and economic progress of research organizations and even entire nations.

    Globally, the concentration of AI knowledge within an English-centric framework threatens to calcify an inequality where English-speaking nations maintain a structural advantage in scientific and economic progress. This disparity is a global concern because a mono-lingual AI ecosystem stifles innovation.

    For instance, if an LLM only offers actionable next steps for a scientific discovery in English but not in other languages, that scientific discovery is less likely to cross-fertilize further research in non-English settings. Conversely, if a new healthcare practice from a developing nation is “trapped” within a low-resource language, the world suffers a collective loss of the potential benefits from its application. Ultimately, progress should be measured by performance across all linguistic settings to ensure AI acts as a channel for knowledge, not a barrier.

    Tech companies in many countries around the world are working on their own, local models, and governments are also investing in language-specific AI. South Korea, for example, has launched a national push for AI sovereignty, with firms such as LG, Naver, and SK Telecom building domestic LLMs. The U.A.E.’s Falcon has advanced Arabic-language processing. Yet most non-English speakers still rely on Western tools. Their capabilities are often simply better, and they are the standard option when collaborating internationally.

    If we want AI models to be truly cross-lingual tools, we might need to tweak the building process. Simply translating English output into other languages isn’t a good strategy, either. Business standards, cultural nuances, and social guidelines vary wildly; what works as a persuasive email in English might seem rude or insincere in Chinese.

    Instead, we need to prioritize original-language data. Builders need to digitize and feed models more scientific and technical materials in languages ranging from Arabic and Chinese to Vietnamese and Swahili. Curated translations of technical documents from English to other languages could produce essential training materials, especially for low-resource languages.

    Reinforcement learning, especially reinforcement learning with human feedback (RLHF), can optimize AI output in non-English settings by providing more coherent and contextually appropriate responses. In RLHF, the model learns what humans consider “good” versus “bad” behavior through direct feedback from humans (e.g., ranking different outputs in specialized topics, refining output to avoid sexually explicit content). Builders might also rely more on techniques like Retrieval-Augmented Generation (RAG), which allows a model to look up specific, high-quality documents rather than just using its mostly English-based memory.

    We are currently at a crossroads. We can continue down the path of English-first AI, which will likely widen the cross-lingual AI gap. Alternatively, users speaking various languages may push AI companies to treat linguistic diversity as a technical necessity rather than an afterthought.

    Progress in AI shouldn’t just be measured by how fast a model can write a business plan or assist with scientific discovery in English, but by performance across linguistic settings. Non-English speakers can also construct and promote rankings that separately document performance metrics across languages. AI companies with superior cross-lingual performance can use those segmented rankings and performance metrics to achieve a superior market position over their rivals.

    We need to be clear-eyed. When powerful AI models are built on unequal linguistic foundations, the consequences reach far beyond the lab. In that regard, the notion of AI as an equalizer might just be a myth. We should confront these blind spots now and address them at the design stage before language gaps harden into lasting inequalities.

    (Header image: Visuals from Vectorstock and Shijue/VCG, reedited by Sixth Tone)