There is international consensus on the need to promote linguistic diversity, in cyberspace as well as offline. This is reflected in the World Summit on the Information Society (WSIS) action line C8 (Cultural diversity and identity, linguistic diversity and local content) and UNESCO’s Recommendation concerning the Promotion and Use of Multilingualism and Universal Access to Cyberspace (2003).
In previous reports, we have explored the status of multilingual content online, and noted the gap between the rich diversity of languages spoken in the offline world, and the languages of cyberspace – English is the language of more than half of web content.
Our annual reports have noted the gap between the drive for increased linguistic diversity in popular web applications, and the continuing challenge of ensuring universal acceptance of internationalised domain names. Facebook supports more than 70 languages, Google Translate more than 100 languages, Twitter supports 34 languages. The world’s most popular apps are also increasing the number of supported languages: Whatsapp is available in more than 20 languages, Instagram in 33 languages.
Nevertheless, where IDNs are in use, the language of web content is more diverse than it is with traditional ASCII domains. While there is a long way to go before we see the same linguistic diversity online as there is offline, IDNs seem to help redress the balance, at least as far as the most-spoken languages are concerned.
As a result of our analysis of the language of content associated with IDNs, we can state that:
- IDNs help to enhance linguistic diversity in cyberspace
- The IDN market is more balanced in favour of emerging economies
- IDNs are accurate predictors of the language of web content.
In previous reports we have noted that language of web content tends to follow IDN script. IDNs accurately signal what languages will be found.
The research team has measured the language of websites associated with .eu IDNs. Through our collaboration with Verisign and access to open gTLD zone files, we have also measured languages associated with IDNs in gTLDs. Based on what we discovered from the open zone files, we have extended the analysis to include content of ccTLD IDNs, which form the majority of IDN registrations.