Email

For universal acceptance, there are two, huge challenges that seem to prevent true internationalisation of the Internet. In other pages we have seen that validation errors cause applications and services to fail. Validation, and the use of IDNs as personal identifiers, seems to be a nearly intractable problem for internationalisation.  The other challenge is Email Address Internationalisation (EAI).

What is email address internationalisation (EAI)?

Components of an email address - broken into user portion and the domain name

Email addresses consist of two parts separated by an “@” symbol. At the front of the “@” symbol is a string called the user portion (technically known as the “local-part”). Behind the “@” symbol is usually a domain name. To achieve universal acceptance of IDNs we should be able to use any IDN as the domain name and also use the non-ASCII script for the user portion. Not only should we be able to address email with these internationalised addresses, but we should be able to send and receive them as well. A system that allows a user to address email, and send and receive it with IDNs is often referred to as Email Address Internationalisation (EAI).

We should be able to use email addresses that look like:

δοκιμή@παράδειγμα.δοκιμή

我買@屋企.香港

чебурашка@ящик-с-апельсинами.рф

संपर्क@डाटामेल.भारत

It is crucial to understand that this does not just apply to the top-level of the domain name. In a fully EAI compliant system, we should be able to use IDNs anywhere in the domain name string where they are legally permitted by IDN and registry standards.

Why is EAI difficult?

For all its utility, electronic mail is surprisingly complex. Even the human-facing component can be a standalone piece of software (e.g. Outlook), a web page (e.g. the basic interface to Gmail), or a mobile client that simply queries a server and synchronises its view of the available email with that of the server.  On the server side, there are two major ways to arrange for the pickup of electronic mail. Finally, in between the sender and receiver are mail transmission agents (MTAs) that arrange for the forwarding of email from one place to another.

Electronic mail works because all of these components are standardised – they are interoperable. Interoperability means that any computer, running any software can connect any way it likes to the Internet and send and receive email – as long as it abides by the standards for email.  Electronic mail is so complex that Internet Engineering Task Force (the standards development organisation) has published a guide to understanding the entire ecosystem.

The standards that make up the traditional electronic email ecosystem are very old (the basic Internet email format was codified in 1977). The installed base of servers and clients is also extremely large. Those two facts, taken together, make change to the email ecosystem very challenging.

Older, legacy, email messages consist of three parts: the envelope, the headers and the body.

  • The envelope of the message contains metadata or information about the message (e.g. when it was received, the size of the message, how important it is, etc.).
  • The header is a set of fields such as the sender address, the subject, the date the message was sent and other information provided by the sender of the message.
  • The body of the message contains the text of the message and any attachments.

When a message is sent, the sender provides “From:” and “To:” addresses as well as a “Subject:” and the content of the message.  As we have seen above, the legacy address is of the form [email protected] name.  The domain name part is an LDH-based string and the local part can be arbitrary ASCII characters.  The fundamental problem for EAI is changing email so that it can use internationalised scripts in both the local part and the domain name.

Because the underlying system has so many parts, solving the EAI problem is quite complex.

How to solve it?

At a high level, solving the problem seems simple. When a user agent (e.g. Outlook) wants to send an EAI message to another user, it needs to be sure that all the infrastructure between sender and receiver, plus the receiver itself, can handle the email. For one of the two major mail protocols, SMTP, the solution is in an extension to the older email standards. Computer applications can test whether the receiving computer can support the extension that supports EAI, called SMTPUTF8.

That seems easy, but what happens when a sender can’t find a receiver that handles SMTPUTF8?  What happens when the message is delivered to the recipient, but they do not have a user agent that can handle EAI?

In fact, the technical, standardised solution for EAI has been around for more than a decade. However, the related infrastructure and deployment challenges still remain. There are two, crucial challenges that need to be overcome, before EAI is achievable:

  • The client software (for instance, Outlook or Thunderbird) needs to be able to display, process and store the internationalised address. For instance, the client software should display EAI addresses in unicode, while passing the domain name to the mail server in punycode.
  • The server software must support EAI and allow for the transfer of the mail in a manner that preserves the EAI address.

Progress in 2019

In 2019 our analysis shows that there has been real progress with the first of these challenges, but that the second remains quite difficult.

One example of success in providing EAI to users, is the Digital India initiative. Digital India is an effort by the Government of India to provide its service to citizens online – addressing infrastructure, connectivity and applications. India is an incredibly diverse linguistic country and having English-only email services is a barrier to Digital India meeting its goals. To try to solve this problem, the Indian government asked emails service providers to provide EAI in Indian-native languages.

One of the companies that responded to this initiative, Data Xgen Technologies Pvt Ltd, now provides EAI infrastructure in 16 different languages – including many that are local to India, and some that are foreign. Part of the success of this offering is that it works with mail agents other than its own (for instance, Outlook). While this started with support for India’s IDN ccTLD, the growth of the scripts supported is one of the clear success stories of supporting EAI.

Another example is Microsoft’s effort to incorporate EAI into end-user facing products. In the last year, Microsoft has added email internationalisation to Exchange Online, joining support for EAI in Office 365. The result, when combined with the earlier announcement of support for SMTPUTF8 at Google, means that widely deployed email systems have support for EAI. It also means that a Gmail user can successfully format and send an EAI email to, for instance, an Indian email recipient.

In 2019, we see the following:

  • Ever larger deployment bases of both client-side and infrastructure for EAI support. These deployments mean that users of those systems can successfully send and receive EAI messages. However, if those systems are not connected to compliant EAI systems, they remain islands of internationalisation – failing to fulfil the promise of true email internationalisation.
  • Growing interconnection between EAI compliant infrastructure. This is effectively connecting the islands of EAI-compliant infrastructure together in cooperative, internationalised, email support. In 2019, this is a growing and positive development for EAI and Universal Acceptance in general.
  • Finally, legacy infrastructure, not EAI-compliant, remains connected to the electronic mail infrastructure. Like other technological developments on the Internet such as IPv6 or DNSSEC, it looks to be very difficult to force or entice email service providers to upgrade to EAI-compliant services. The result is, in areas of linguistic diversity and areas where there is little English language penetration, there is little success at internationalisation for email.