With Mongolia gaining approval for its new internationalized domain name (IDN), there are now 30 countries and territories with non-Latin top-level domains.
I’ve updated my map and, as you can see below, IDN coverage is significant:
By making use of free tools such as Globalize.js, developers can easily adapt their applications for new markets with a minimal amount of work. For example, adapting the format of a date or number for a different country requires a single library function call.
This book also goes into more complex operations and functions, but it’s important that developers first get a feel for simple data format localization.
What is Globalize.js and why is it so valuable for developing global software?
Globalize includes locale data for more than 300 locales, including presentation of numbers, date notations, calendars, time zones. It is easily modifiable and extensible to cover new locales.
You devote a chapter to the finer points of Unicode. Why is it so important for developers to understand Unicode?
Unicode has become widely used on web pages, in applications, and in databases, but most IT professionals still have a rather limited understanding of it. The generality of Unicode—covering more than 100,000 characters from all kinds of writing systems—has its price: complexities and practical issues. These issues are often encountered in common operations such as string comparison and case conversions.
You’re based in Finland. What common mistakes do you see made by developers who have localized software for your locale?
The most common mistake is partial localization: a page or application appears to be in Finnish or Swedish, but on a closer examination, you’ll see English notations for data items. Even the most current software may use a date notation like 11/6/2012, which is not only incorrect by our language rules, but also ambiguous.
Often, menus contain a mix of Finnish and English items. You might also see a dropdown list of countries of the world, with names in Finnish but in an odd order, usually based on English-language alphabetization rules.
Mistranslations are not rare and may cause real harm, particularly in menus, buttons, and labels for form fields. An expert may understand the cause of the problem—someone has translated a short fragment of text with no idea of the context−but average users are simply confused and may revert to use the English-language site as a lesser of two evils.
HTML5 proposes new input attributes, such as date and number. But these elements pose challenges that many developers might not be aware of. Can you explain why?
Browser support is still limited, inconstant, and partly experimental. But in addition to that, these elements have not yet been defined and implemented with globalization in mind. They may be implemented using browser-driven localization, using the browser’s locale. Adequate localization would reflect the locale of the content, the web page.
These issues can be partly addressed using code that avoids improper localization. But although the new elements are promising in the long run, they should be regarded rather as interesting features to be tested and used in controlled situations, rather than used in normal production.
NOTE: We also offer an enterprise price for a PDF copy of the book to be shared across your company.
The French domain name registry AFNIC has published a PDF explaining why it will soon support internationalized domain names (IDNs). According to AFNIC:
As of July 3, 2012, it will be possible for anyone to register domain names under the .fr, .yt, .pm, .wf, .tf, and .re TLDs with new characters such as é, ç or the German Eszett.
Consider the website for France’s presidential palace. Right now, you would get to this website via www.Elysee.fr, rather than its actual name www.Élysée.fr.
Now I realize there is a lot of extra money to be made by registrars if every company, government agency, and organization registers a bunch of extra domain names, differentiated only by an accented character or two. And some might argue that IDNs amount to little more than a boondoggle for registrars.
I would argue that the new (corporate) generic TLDs are the real boondoggle. I mean, does the Internet really need .honda or .disney top level domains?
IDNs support languages, plain and simple. Which has been a very long time coming. And which is ultimately about showing respect not just for languages but the people who use them.