Methodology: multilingualism and ccTLDs

Each year, the IDN World Report research team reports on the language of web content associated with IDNs.  We can report results for gTLDs and .eu, as we have access to the individual zone files for research purposes.  This is not possible for most ccTLDs, where the majority of IDNs are registered. Therefore we infer the numbers for ccTLDs from our gTLD analysis.

 

Type of registry Second or top level Percentage with active website
‘Legacy’ gTLDs Second level 39%
.eu Second level 68%
.es Second level 68%*
.vn Second level 16%
New gTLD Second level 30%
New gTLD Top level 18%
.рф (ccTLD) Top level 54%

 

The rate of active web content associated with IDNs at the second level ranges from 16% (.vn) to 68% (.eu and .es), and at the top level from 18% (IDN new gTLDs) to 54% (РФ).

In our automated analysis of the language of web content, the research team has observed that false positives and errors arise when there is too little text.  We have therefore limited our language analysis to a smaller data sample.

The script of IDN seems to affect usage rates – with Han and Arabic showing lower levels of active web content than the combined Han, Katakana and Hiragana (associated with Japanese language) and Latin.

Therefore, when inferring usage rates for ccTLD IDNs, we applied the following rules:

• Use actual data where available (.eu, .es*, .vn, . рф). This accounts for 1.9 million IDNs, or 42% of the IDNs in ccTLDs (both at second and top level).
• Assume an active website rate of 40% for IDNs (where top or second level)
– Discount by 20% for Han script, and right to left scripts (Arabic, Hebrew)
– Discount by 25% for new offerings, or giveaway policies.

*data from 2015 IDN World Report