Non-English URLs
After talking to Simone, Mahdi Taghizadeh has started a thread on the usage of non-English characters in URLs, and the difficulties and problems that raise in this field. He has also invited me to the discussion and writing about the topic. Yesterday Simone talked to me about this issue as well, and we walked through some aspects of the problem.
First of all, I have to clarify the fact that you can use non-English characters in URLs because there may be some readers who don’t know about this fact. You can use any Unicode characters in your URLs, and there is no limitation on it. Moreover, it’s possible to have domain names and extensions with international characters, and it’s not mandatory to have English domain names and extensions. This may be another fact that many users don’t know about.
IANA takes care about such stuff and manages the list of local domains for each country. For example, there is a page which lists all the valid characters in Persian domain names. Here is a local Persian domain with international characters, and this is a Hebrew domain using another set of characters.
But what’s the challenge? Many of the spoken languages around the world are derived from Latin, Greek, or French, and they use the same alphabet, or at least there are equivalent alphabets for them in English language, so a great number of world’s population don’t care about the usage of their native alphabet in internet URLs.
However, when it comes to other languages like Persian, Arabic, Chinese, Japanese or Hebrew, their speakers have difficulties with using their own characters in the URL even though you saw that they’re able to use these characters even in domain and extension names. But what are the problems that obstruct the extensive of international characters in URLs and limit them to English characters only? I try to highlight them and give a brief description about each one.
In my opinion the first and foremost point is the fact that there should be a unity among URLs on a site, and there shouldn’t be a combination of English and non-English characters in URLs. If it’s supposed to be English, then the whole URLs should be English, and if it’s supposed to be non-English, then all the URLs including domains and extensions should be non-English. Former one is easily possible because there is a good fundamental for it but the latter one is not, mostly because there isn’t a good support for non-English domain names and extensions. I said that it’s possible to use these names but there are some difficulties in using them. For example, when you enter a non-English domain name or extension in your browser, it will be converted to an ugly unreadable text that wouldn’t be acceptable by users. I think this is enough to limit the usage of international characters in URLs.
The second problem is that in many circumstances non-English characters cannot appear in URLs in their original form. In other words, when you enter a character in your URL, it will be converted to another form that is not something readable and user-friendly, and this may happen as a part of the process done by browser, web server or the site application. Of course, this is a very bad point that stops users from using these characters.
The other problem is rooted in technologies because many of the current web server technologies don’t support non-English characters or if they do, there isn’t a straightforward way to work with them. To be honest, many of the software technologies are written for English and other cultures should port these technologies to their language. Microsoft should be a world-leading company in supporting all the languages and cultures in its products but it’s even too far from the point that it supports all languages in the native form.
Besides, the big challenge for the usage of non-English characters is the limitations for entering these characters in browsers. If you want to enter a URL with such characters, you often need to switch between two languages, and in the case of Persian or Arabic, you also need to switch between two directions, so this is an important usability issue that discourage users.
However, many of the URLs that we use are embedded in web pages and we don’t enter these URLs manually. Here it’s possible to embed localized URLs in web pages by developers but why it’s not that common?
There may be some reasons for that. First reason, in my honest opinion, is something that I call it the beauty of unity which means that developers cannot prefer to use a combination of English and non-English URLs in their applications.
Moreover, we would find out the value of using localized URLs. Can they help SEO? Or can they help the flow of content on a site? I don’t think that search engines aggregate non-English URLs very well and they have a good impact on SEO. I’m not 100% sure about my statement but some evaluations convinced me about this fact. If anyone can shed a light on this thread, it’s worth leaving a comment for clarification. I also don’t think that local characters can help the flow of content on a site because there is no reason to support this statement.
The last parameter that may play a negative role in using international characters in URLs is the ease of working with non-English characters in URLs. To be honest, comparatively working with English characters is much easier for all the users because computer hardware (including keyboard) are built for English characters. When working with other cultures, it’s harder to type or edit such characters.
In the end, I want to draw a conclusion from this discussion, and that conclusion is that using non-English characters in URLs is a problem that is not solved by any culture yet. Some languages that are derived from English appear to have accepted to use English alphabet in their URLs, and yet there isn’t a good solution for other languages like Persian, Arabic, Chinese, and Hebrew. I personally would prefer to use English in all URLs not only to have a consistency between URLs on a single site but also to allow foreign users to use a universal language to understand the structure of sites and navigate between their pages. This has been my own experience when visiting sites in Chinese, Italian or Arabic. Knowing English language, I could hack their URLs and find what I was looking for.
In fact, I haven’t seen many sites that use localized URLs and it seems to be very uncommon to use such URLs on a public site. Mahdi has given an example of Wikipedia but I don’t think they have used localized URLs intentionally. This is a part of their application (which is a banal feature for all wiki applications) to use the title that is entered by user in the URL, and it happens for any language. Furthermore, I simply could find many of these URLs that had appeared in encoded form which is apparently very ugly and unreadable.
On the one hand web mandates some accepted standards by all nations and using English URLs may be an unwritten standard that is made common among clients, and on the other hand, this can be considered as a big flaw in the nature of the web to not be able to support all nations and languages.
[advertisement] Axosoft OnTime 2008 is four developer tools in one: bug tracking, project wiki, feature management, and help desk. It manages your development process so developers can focus on coding. Installed or Hosted – Free Single-user license -- Free 30-day team trial.
9 Comments : 10.31.08
Feedbacks
@Mahdi
It depends on your users. If you think that you users can enter friendly English names, then that would be the better otherwise you can use the identifier in URL rather than the non-English title.
In most cases users don't type links of tags pages. They just click on links in a cloud tag. But I'm worry about using Persian tags in URL because it breaks website URLs unity. Can it be an exception in your opinion?
@Mahdi
It depends on your application, its requirements and its implementation but I think it's better to use tag identifiers in URLs if they're Persian words.
There were a lot of high quality articles dawned in my Google Reader last month. Chris Spooner (Blog
Wouldn't it be confusing if we have to browse the websites that are in other language. But we can definately use alternative webaddress for a site in a local language.
The first step is difficulties
multilanguage web site must have unique website name!!
Pingback from Best of the web - my pick for October 2008 | zbStudio.net

#1
Mahdi Taghizadeh
10.31.2008 @ 1:22 PM
Dear Keyvan,
Thank you for your follow up. As I said in my post I agree with you and prefer a unity among all URLs of a web application and so they should be all English in my opinion.
Also we have solutions to build more SEO and user friendly URLs; for example adding a FriendlyUrl column in database to have an equivalent English title for that record (that can be the name of product or a news headline), but what do you suggest for cases like showing cloud tag URLs? Assume we have an application in which users can assign one or more tags to an article then all users can access a particular tag articles using a URL like this: /articles/tags/hardware but what about Persian tags for example? You suggest something like articles/tags/سخت-افزار or articles/tags/hardware (in which 'hardware' is that FriendlyUrl column in Tags table in our database)?