Introducing Internationalised Domain Names and a technical overview of IDN structures (punycode), how it contributes to linguistic diversity online and a history of IDN deployment.

What are Internationalised Domain Names?

Domain names, which are a core part of the Internet’s addressing system, work because they are interoperable and resolve uniquely. This means that any user connected to the Internet, anywhere in the world, can get to the same destination by typing in a domain name (as part of a web- or email address). The plan to internationalise the character sets supported within the Domain Name System is almost as old as the Internet itself. However, technical constraints and the overriding priority of interoperability resulted in a restricted character set within the Domain Name System: ASCII a to z, 0 to 9 and the hyphen. This restricted character set is known as LDH (Letters, Digits and Hyphen) within the technical community.

Technical standards to internationalise domain names were developed from the mid-1990s. The solution retains the Domain Name System’s restricted character set, and transliterates every other character into it. Each series of non-ASCII characters is transliterated into a string of ASCII characters prefixed with xn– . The xn– ASCII forms of the domain names are meaningless to humans, but meaningful to machines (name servers) that resolve domain names. Thus, humans see the meaningful, transliterated characters when they navigate the Internet, whilst the underlying technical resolution of domain names remains unchanged.

A technical overview

Punycode is the algorithm used to transform a Unicode Label into an ASCII string. This ASCII string is prefixed with “xn–” (ACE prefix) to create an “A-label” or ACE label (ASCII Compatible Encoding) that the domain name system understands. For more details, see section 2.3 of RFC 5890.

Implementation of IDNs began in 2000 at the second level (under .com and .net) and 2001 (.jp). In the ten years that followed, several ccTLDs deployed IDNs, primarily supporting local language character sets. Some experimented with other strategies for internationalising domain names, but the IDN technology proved the most successful.

IDNs are technically complex to implement. Many challenges remain, including (at a technical level) how to handle variant characters, which are prevalent in Arabic and Chinese scripts. Another challenge is the user-experience, eg consistent representation in browsers and full functionality in emails – this is called ‘universal acceptance’.

How IDNs contribute to linguistic diversity online

Despite the technical challenges, IDNs are viewed by many as a catalyst and a necessary first step to achieving a multilingual Internet. According to UNESCO, in 2008 only 12 languages accounted for 98% of Internet web pages; English, with 72% of web pages, was the dominant language online. Recent reports indicate that other languages are growing rapidly online. For example, by 2010, only 20% of Wikipedia articles were in English, and by December 2018 this had fallen to less than 12%. Supporters of IDN believe that enabling users to navigate the Internet in their native language is bound to enhance the linguistic diversity of the online population, and the World Report has demonstrated that IDNs are strongly linked to local content.

While this study focuses on the web, it should be noted that other applications also require internationalisation, eg email, file transfer protocol, etc.

A short history of IDN deployment

For nearly two decades, hybrid Internationalised Domain Names have been available at the second level with ASCII Top Level Domains (for example, παράδειγμα.eu in the figure above). This situation was only satisfactory for Latin-based scripts used by most European languages, where the IDN element would commonly reflect accents, or other diacritical marks on Latin characters. For speakers of languages not based on Latin scripts (for example, Chinese, Arabic), the hybrid IDN/ASCII domains were unsatisfactory. Right-to-left scripts, such as Arabic and Hebrew created bi-directional domain names when combined with left-to-right TLD extensions, requiring users to have a familiarity with both their own language, and Latin scripts in order to navigate the Internet. As explained in the report IDNs State of Play 2011, bi-directional domain names not only require Internet users to change script when typing in a single web address, but also potentially confuse the strict hierarchy of the Domain Name System.

Internet governance discussions from 2006 onwards highlighted the lack of IDNs in the root domain zone (which would enable full IDN domain names including at the top level) as a key building block towards the goal of a multilingual Internet. From 2005, there was increasing pressure on ICANN, the global coordinator of Internet domain names, to implement IDNs in the root zone.

In the meantime, some countries created their own work-arounds. For example, China and the Republic of Korea developed keyword searches at the domain name servers for .cn and .kr. For those searching for domains within the country, the keyword system resolves the domain without the user having to type the Latin-script domain ending (TLD). In China and Egypt, browser add-ons were developed to translate a domain into another name that would be looked up on national servers, to enable Internet users to enter local character strings into browsers. However, this solution relied on users downloading a plug-in, which was not compatible with every browser. These efforts indicate the importance that policy makers and technologists have placed on internationalising domain names, and that IDNs emerged as the superior technology amongst a number of alternatives.

Following pressure from the ccTLD community, ICANN introduced a fast track process to create IDN ccTLDs in 2007-2008, describing the programme as a “top priority”. In 2010, ICANN took the historic step of approving ccTLDs in native scripts for four countries: مصر (Egypt), السعودية (Saudi Arabia), рф (Russian Federation) and امارات (United Arab Emirates). Since then, there has been a steady expansion of the number of IDN.IDN registries launched, including 한국 (Republic of Korea), قطر (Qatar), فلسطين (Palestine), الجزائر (Algeria), 香港 (Hong Kong), سورية (Syrian Arab Republic), қаз (Kazakhstan), срб (Serbia), 新加坡 and சிங்கப்பூர் (Singapore). By end of December 2018, 86 ccTLDs were offering IDNs, including 23 at the top level.

In 2012, ICANN opened applications for new gTLDs, including IDNs. More than 100 applications for IDNs were received and have led to gTLD IDNs coming onto the market from 2013 onwards. By end December 2018, nearly 450 gTLDs were offering IDNs, including more than 50 at the top level. The IDN World Report provides comprehensive data for that rollout and continues to monitor the growth of IDNs globally. The report continues to attract new partners and sources of data, and plans to add new IDN data to its growing pool of data and monitor new gTLDs as they come onto the market.