Language of web content, .eu IDNs

Each year since 2013, we have analysed the language of web content associated with .eu IDNs. The data set was 37 000 .eu IDNs (second level) and 2 000 .ею IDNs (top level).

IDNs without active name servers were excluded from further study, as in previous years. This year, we changed our methodology to identify more accurately low quality content in the data set (parked or single-page). IDNS with low quality content were excluded from further study. Second level Cyrillic script IDNs under .eu were also excluded, due to the retirement of those names in May 2019.

Parking pages are more likely to be in English language

The research team analysed the web content associated with the .eu and .ею IDNs. As in previous years, we found that languages cluster around the scripts associated with those languages.

This year, we identified parking pages associated with the .eu and .ею IDNs. What was striking was that after elimination of parking pages, the instance of English language reduced from 36% to 11%. This supports a finding that parking pages are more likely to be in English language.

Languages cluster around relevant scripts

If we are correct in thinking that IDNs link strongly with associated languages, we would expect to see a high correlation between script and language (eg Greek content with Greek script domain names) and to see web content in languages for which IDNs are particularly relevant (eg German, French, Swedish). Because the .eu and .ею domains are associated with the European Union and three countries of the European Economic Area (Iceland, Liechtenstein and Norway), and has a residency requirement, we would not expect to see many non-European languages (eg Chinese, Korean) featuring in the language analysis.

As with the larger data set, clear patterns emerge within the .eu and .ею data.

Domain name script is an accurate signal of website language

After the elimination of inactive and parking sites, the research team performed automated analysis of the remaining IDNs in the data sample, to determine the language of web content associated with the three scripts of IDNs – Latin, Cyrillic and Greek.

Latin script

An array of European languages are associated with Latin script IDNs, with German language making up 61% of websites.  The change in methodology (ie the elimination of low quality content such as parking and single page sites) has resulted in a lower percentage of English language sites, and higher percentages of other languages.

Cyrillic script

Bulgarian language comprises 71% of the websites associated with the Cyrillic script .ею  IDNs. English comprises 18%, and other languages 11%.

Greek script

Greek language websites are only associated with Greek script domains.

The small sample sizes for Cyrillic and Greek mean that relatively small differences in numbers can result in large percentages. As with the larger data set, English performs strongly across all three scripts reflecting its popularity as a second language among Internet users.