Each year since 2013, we have analysed the language of web content associated with .eu IDNs. The data set was 40 000 .eu IDNs (second level) and 1 900 .ею IDNs (top level).
Exclude those with no content downloaded
For 13 000 of the .eu IDNs and 1 500 of the .ею IDNs, no content was downloaded.
This left a set of 27 000 .eu IDNs and 400 .ею IDNs with active web content.
Second level IDNs under .eu
IDNs at the second level under .eu are supported in three scripts: Latin, Cyrillic and Greek.
In the data set of 40 000, there were 37 000 Latin script .eu domains, 1 100 Cyrillic, and 1 900 Greek script
Of the set of 27 000 .eu second level IDNs with active web content, 26 000 are Latin script, 600 Cyrillic script and 400 Greek script.
The rate of active web content is 70% for Latin script, 55% Cyrillic, and 21% Greek. has dropped across all scripts under the .eu second level IDN space since last year.
If we are correct in thinking that IDNs link strongly with associated languages, we would expect to see a high correlation between script and language (eg Greek content with Greek script domain names) and to see web content in languages for which IDNs are particularly relevant (eg German, French, Swedish). Because the .eu domain is associated with the European Union and three countries of the European Economic Area (Iceland, Liechtenstein and Norway), and has a residency requirement, we would not expect to see many non-European languages (eg Chinese, Korean) featuring in the language analysis.
Languages cluster around relevant scripts
As with the larger data set, clear patterns emerge within the .eu data.
Domain name script is an accurate signal of website language
Bulgarian and Russian language websites are associated with Cyrillic script domains and not with Greek or Latin script domains (apart from a single Russian language website associated with a Latin script IDN); Greek language websites are only associated with Greek script domains. An array of European languages are associated with Latin script IDNs, with German language making up 40% of websites. These results are broadly consistent year on year.
The small sample sizes for Cyrillic and Greek mean that relatively small differences in numbers can result in large percentages. For example, of the 5% of “other” languages in the Greek script IDNs, none have more than 3 websites. As with the larger data set, English performs strongly across all three scripts reflecting its popularity as a second language among Internet users.