Each year since 2013, we have analysed the language of web content associated with .eu IDNs. The data set was 46 000. Of these, 20 000 had too little content to analyse, leaving a working data set of 26 000 names with active web content. And of these, 25 000 were Latin script, 640 Cyrillic script and 150 Greek script.
Therefore the percentage by script with active web content was 58% for Latin script, 30% for Cyrillic script and 8% for Greek script. Note that the number of sites with active web content is lower than the percentage with active name servers and redirects.
The rate of active web content has dropped across all scripts under the .eu second level IDN space since last year.
If we are correct in thinking that IDNs link strongly with associated languages, we would expect to see a high correlation between script and language (eg Greek content with Greek script domain names) and to see web content in languages for which IDNs are particularly relevant (eg German, French, Swedish). Because the .eu domain is associated with the European Union and three countries of the European Economic Area (Iceland, Liechtenstein and Norway), and has a residency requirement, we would not expect to see many non-European languages (eg Chinese, Korean) featuring in the language analysis.
Languages cluster around relevant scripts
As with the larger data set, clear patterns emerge within the .eu data.
Bulgarian and Russian language websites are associated with Cyrillic script domains and not with Greek or Latin script domains (apart from a single Russian language website associated with a Latin script IDN); Greek language websites are only associated with Greek script domains. An array of European languages are associated with Latin script IDNs, with German language making up 40% of websites (57% in 2013, 46% in 2014) .
The small sample sizes for Cyrillic and Greek mean that relatively small differences in numbers can result in large percentages. For example, of the 14% of “other” languages in the Greek script IDNs, none has more than 8 websites. As with the larger data set, English performs strongly across all three scripts reflecting its popularity as a second language among Internet users. French and German also appear in web content associated with a small number of Cyrillic and Greek script IDNs (fewer than 20).