The early Internet was limited to the ASCII character set. As we have seen elsewhere, this had dramatic implications on the development of a multilingual Internet for both the operations and content of the Internet. The technical ability to transmit and display content in other scripts – to support the internationalisation of the Internet – started in the 1990’s. However, the Domain Name System (DNS) lagged behind.
As first standardised, names in the DNS were limited to a subset of ASCII characters including the letters a-z, the digits 0-9 and the hyphen character (“-“). All registrations in the DNS were, for a long time, limited to this, so-called, LDH restriction. Support for a more diverse character set was first provided in a set of standards published by the Internet Engineering Task Force (IETF) in 2003. However, just because the standard was published, did not mean IDNs were widely available or usable.
The World Report on IDNs studies the availability and trends for registrations of IDNs, this section focuses on how usable those IDNs are once they are registered and resolvable.
Universal Acceptance (UA) is a metric. It is a measure of how well IDN domain names are accepted, displayed, stored and processed by the Internet’s applications and infrastructure. In previous research we have said that UA is a measure of how ‘usable’ an IDN is. UA measures the ability of IDNs to be used in the same way as traditional domain names. Another definition might be: UA is the state where an IDN can appear and be used anywhere an ASCII domain name appears, with predictable, reliable and appropriate results.
The DNS is a fundamental yet evolving part of the Internet’s infrastructure. The ability to use a domain name as part of a query for other information is a part of the Internet we often take for granted. The ubiquity of the DNS has been a source of the UA challenges for IDNs. For some time, application developers for the Internet presumed, erroneously, that all domain names ended with two or three characters – and that those characters were always ASCII.
That misunderstanding has led to the principal problem for Universal Acceptance: while it may be possible to register and resolve IDNs, if software and Internet infrastructure does not accept, process, display and use those IDNs correctly, they remain unable to fulfil the potential of a truly rich, multilingual Internet.
What is the Source of the Problem?
To support IDNs, the Internet changed the DNS so that it supported non-ASCII characters. To do this, the DNS evolved to support the Universal Coded Character Set, known as unicode (or UTF8). Since the DNS was only built to support ASCII characters, there needed to be a translation from Unicode to ASCII strings – as well as a translation in the other direction. The Unicode-to-ASCII translation is called punycode and results in a translation for every IDN to a string called an “A-label.” The A-label for IDNs is easy to recognise because is always starts with the ASCII characters “XN–”
Older parts of the Internet would have no problem processing, supporting and displaying A-labels because they are simply parts of an ASCII domain name. The problem for IDNs emerges when the unicode characters are used, stored or displayed by the Internet’s infrastructure. Older software and applications do not have the ability to store, process and display the unicode characters properly. As we have seen, even newer software fails to take the unicode labels into account. The result is that the IDNs, while registered in the DNS and available for resolution, do not work like the older ASCII domain names. Simply stated, IDNs face a barrier not faced by other domain names.
Universal Acceptance is a problem that is not limited to IDNs. With the expansion of the DNS root zone starting in 2013, many new top-level domain names appeared in the public Internet. Many have characteristics (especially string-length) that make previous assumptions about top-level domain names fail. As we have seen in previous years, this is a problem that significantly affects user account creation and validation.
In addition, IDNs appear in strings that are not used in the traditional DNS. It is very common to use a domain name as part of an username identifier. Domain names also form a crucial part of public email addresses. Domain names also appear in Internet infrastructure settings such as digital certificates. With the advent of the Internet of Things, domain names have the potential to be an identifier for many billions of devices and sensors. For internationalisation to succeed, IDNs must be accepted in these diverse settings which go well beyond the confines of the DNS.
The standard for Internationalised Domain Names was largely completed in 2008. However, the unicode standard, upon which IDNs are built, is an evolving and changing document. When unicode is changed, now characters appear and properties of existing characters (or, “codepoints”) are modified. In 2018, the IETF was at work bringing the IDN standard up-to-date to reflect changes in unicode. In addition, the IETF is currently developing a new standard that would help registries validate and IDN prior to registration.