Q&A with Jukka Korpela, author of Going Global with JavaScript and Globalize.js

What’s the most important thing you want JavaScript developers to learn from this book?
By making use of free tools such as Globalize.js, developers can easily adapt their applications for new markets with a minimal amount of work. For example, adapting the format of a date or number for a different country requires a single library function call.

This book also goes into more complex operations and functions, but it’s important that developers first get a feel for simple data format localization.

What is Globalize.js and why is it so valuable for developing global software?
Globalize is a standalone, open source JavaScript library that help you to globalize your JavaScript code. Globalize lets you adapt your code to work with a multitude of human languages. You need not know the languages or their conventions and you do not need to manually code the notations.

Globalize includes locale data for more than 300 locales, including presentation of numbers, date notations, calendars, time zones. It is easily modifiable and extensible to cover new locales.

You devote a chapter to the finer points of Unicode. Why is it so important for developers to understand Unicode?
Unicode has become widely used on web pages, in applications, and in databases, but most IT professionals still have a rather limited understanding of it. The generality of Unicode—covering more than 100,000 characters from all kinds of writing systems—has its price: complexities and practical issues. These issues are often encountered in common operations such as string comparison and case conversions.

You’re based in Finland. What common mistakes do you see made by developers who have localized software for your locale?
The most common mistake is partial localization: a page or application appears to be in Finnish or Swedish, but on a closer examination, you’ll see English notations for data items. Even the most current software may use a date notation like 11/6/2012, which is not only incorrect by our language rules, but also ambiguous.

Often, menus contain a mix of Finnish and English items. You might also see a dropdown list of countries of the world, with names in Finnish but in an odd order, usually based on English-language alphabetization rules.

Mistranslations are not rare and may cause real harm, particularly in menus, buttons, and labels for form fields. An expert may understand the cause of the problem—someone has translated a short fragment of text with no idea of the context−but average users are simply confused and may revert to use the English-language site as a lesser of two evils.

HTML5 proposes new input attributes, such as date and number. But these elements pose challenges that many developers might not be aware of. Can you explain why?
Browser support is still limited, inconstant, and partly experimental. But in addition to that, these elements have not yet been defined and implemented with globalization in mind. They may be implemented using browser-driven localization, using the browser’s locale. Adequate localization would reflect the locale of the content, the web page.

These issues can be partly addressed using code that avoids improper localization. But although the new elements are promising in the long run, they should be regarded rather as interesting features to be tested and used in controlled situations, rather than used in normal production.

Going Global by JavaScript and Globalize.js

NOTE: We also offer an enterprise price for a PDF copy of the book to be shared across your company.

France to offer support for actual French domain names

The French domain name registry AFNIC has published a PDF explaining why it will soon support internationalized domain names (IDNs). According to AFNIC:

As of July 3, 2012, it will be possible for anyone to register domain names under the .fr, .yt, .pm, .wf, .tf, and .re TLDs with new characters such as é, ç or the German Eszett.

Consider the website for France’s presidential palace. Right now, you would get to this website via www.Elysee.fr, rather than its actual name www.Élysée.fr.

Now I realize there is a lot of extra money to be made by registrars if every company, government agency, and organization registers a bunch of extra domain names, differentiated only by an accented character or two. And some might argue that IDNs amount to little more than a boondoggle for registrars.

I would argue that the new (corporate) generic TLDs are the real boondoggle. I mean, does the Internet really need .honda or .disney top level domains?

IDNs support languages, plain and simple. Which has been a very long time coming. And which is ultimately about showing respect not just for languages but the people who use them.

Going Global with JavaScript: Coming this Fall

JavaScript enables everything from simple online sign-up forms to complex web-based applications.

But there is not much information out there on how to effectively internationalize and localize JavaScript code.

Which is why I’m pleased to announce that Byte Level Books is publishing Going Global with JavaScript and Globalize.js.

The book is authored by globalization expert Jukka Korpela, who wrote my favorite book on Unicode: Unicode Explained.

Readers of this book will learn:

  • How to ensure an application is “world ready” — removing unnecessary language and culture dependencies
  • How to adapt a JavaScript app to local conventions, such as date formats, systems of measurement, time zones, and more
  • How to leverage the Common Locale Data Repository (CLDR) to support global applications
  • How to localize the user interface to address different cultural requirements and expectations
  • How to handle text input that falls well outside traditional “A-Z” characters

I’ll have more to share on the book as we get closer to publication. If you’d like to be notified when the book is published, be sure to sign up for the Global by Design newsletter or the Byte Level Books Twitter feed.


Visualizing Unicode

Unicode is one the great achievements of our era. It’s also incredibly intimidating.

So I love to come across videos and web sites that help demystify Unicode.

A week ago I came across a video created by jörg piringer that displays, in fast motion, nearly 50,000 Unicode characters. I’ve embedded it below:

The video lasts 33 minutes, and it still only displays about half of all Unicode characters. But even so, the video is a great tool to help people who have never heard of Unicode get a feel for how massive this encoding truly is.

But let’s say you want to see the ENTIRE Unicode set.

Fortunately, Andrew West has created a nifty web page that allows you to view all Unicode characters (fonts permitting) — and at your own leisurely pace. I highly recommend checking it out.

Here is a screen shot of one character:

Source: Michael Kaplan

The next Internet revolution will not be in English

This visual depicts about half of the currently approved internationalized domain names (IDNs), positioned over their respective regions.

Notice the wide range of scripts over India and the wide range of Arabic domains. I left off the Latin country code equivalents (in, cn, th, sa, etc.) to illustrate what the Internet is going to look like (at a very high level) in the years ahead.

This next revolution is a linguistically local revolution. In terms of local content, it is already happening. Right now, more than half of the content on the Internet is not in English. Ten years from now, the percentage of English content could easily drop below 25%.

But there are a few technical obstacles that have so far made the Internet not as user friendly as it should be for people in the regions highlighted above. They’ve been forced to enter Latin-based URLs to get to where they want to go. Their email addresses are also Latin-based. This will all change over the next two decades.

For those of us who are fluent only in Latin-based languages, this next wave of growth is going to be interesting, if not a bit challenging. In a Latin-based URL environment, you can still easily navigate to and around non-Latin web sites and brands. For example, if I want to find Baidu in China, I can enter www.baidu.cn. For Yandex in Russia, it’s yandex.ru.

But flash forward a few years and these Latin URLs (though they’ll still exist) may no longer function as the front doors into these markets.

Try Яндекс.рф. It currently redirects to Yandex.ru.

In a few years, I doubt this redirection will exist.

We’re getting close to a linguistically local Internet — from URL to email address. There are still significant technical obstacles to overcome. It will be exciting to see which companies take the lead in overcoming them — as these companies will be well positioned to be leaders in these emerging markets.

UPDATE: I’ve expanded on this topic in a recent article on IP Watch.

Gruber gives up on his ✪ IDN

Tech pundit John Gruber threw in the towel on his domain ✪df.ws.

He writes:

What I didn’t foresee was the tremendous amount of software out there that does not properly parse non-ASCII characters in URLs, particularly IDN domain names. Twitter clients (including, seemingly, every app written using Adobe AIR, which includes some very popular Twitter clients), web browsers (including Firefox), and, for a few months, even the Twitter.com website wasn’t properly identifying DF’s short URLs as links.

Worse, some — but, oddly, not all — of AT&T’s DNS servers for 3G wireless clients choke on IDN domains. This meant that even if you were using a Twitter client that properly supports IDN domains, these links stillwouldn’t work if your 3G connection was routing through one of AT&T’s buggy DNS servers.

There is still a lot of heavy lifting left to do among many software and hardware vendors before IDNs can go mainstream. Unless, of course, a country — say Russia or China — mandates their support and pushes the vendors along.

PS: I’ve updated my top-level IDN tracker.