User identifiers

While the display and use of IDNs as content has continued to improve, the situation with user identifiers continues to be a significant problem. The Donuts and ICANN study mentioned above indicates the scale of the problem.

Many websites require the use of user credentials as part of their site. Gaining access to content often requires a user account and password. A very large cross-section of Internet services use email addresses as the user account. When a user is new to the site they are asked to create an account that often takes advantage of the characteristics of email as a user identifier. One of the clear advantages of using an email address is that it is likely to be unique to the individual using the service.

Clearly, it would be preferable to be able to use an Internationalised Email Address as the user identifier.  However, in the last four years, using the same methodology, we have tested the eleven most popular websites on the Internet that require user authentication and an email address as the user identifier.  In the previous four years, we found that none of the web services tested allow for Internationalised Email Addresses.

This year is no different. In 2017, none of the top eleven websites that require both user authentication and a user account identifier that is an email address allowed for IDNs in the domain-part of the email address.

For 2017, we have expanded the testing by choosing to test the five most popular websites with the same profile in each of Alexa’s regional lists of popular websites. The results of that expanded research again points to the inability to use IDNs or Internationalised Email Addresses as a user identifier in any website with mass popularity. A typical example is Amazon, as in this example:

Attempting to use a Cyrillic-based EAI address to set up an account at Amazon

One explanation for the difficulty might be in the difficulty in transmitting EAI-compliant email.  A popular service that depends on the email address as both a user identifier and as a means to contact its user might find that an EAI compliant email address, while reasonable in principle, could not be used to contact the end user.  Limiting the user accounts to ASCII address might be a deliberate design decision to ensure that the end user could be in contact in the period of transition before we have widespread internationalised email.

IDNs, Federated Identity and OAuth

This problem extends to the concept of federated identity. New protocols on the Internet allow users to authentication themselves using other services – instead of the service being accessed.  For instance, it is common now to be able to log into a web application using an account established at major web sites such as Facebook or Google. This means that the user authentication takes place via another provider. The service that is being accessed accepts a token from the larger provider (for instance, Facebook) as an indication that the user has properly authenticated themselves.

The standard that makes this possible on  the Internet is called OAuth and it provides a service called “federated identity.”  In a nutshell, new developers of applications don’t have to solve the problem of building their own identity systems; instead, they use the mature, sophisticated ones developed by other, possibly larger organisations.

For the purposes of IDNs, OAuth is built to support IDNs and EAI-compliant email addresses. That’s good news for developers hoping to implement systems that support IDNs.  However, it does mean that the system supporting the federated identity also has to support IDNs and EAI-compliant email addresses.  Unfortunately, in 2017 the two most popular sources of federated identity authentication are Facebook and Google. Neither of these support EAI-compliant email addresses as a user account identifier.

In addition to identity, we have seen many major application developers choose to localise their content. Often, this means using an IP address or another identifier to determine where a user is located.  Based on the user’s geolocation, content is “localised” for the user. The localisation can come in many forms: it is common to provide content in the language used where the user is located; sometimes content is customised based on local regulations and laws.

Localised content, especially content that appears in the language that the user most commonly uses, enables a more linguistically diverse Internet. However, our research in 2017 shows that all major electronic commerce web sites that localise their content for their users still require ASCII domain names and email addresses when signing up for an account.