An Inside Look at the Globalization of Windows


Robert Scoble has taped an interesting interview with Michael Kaplan, who’s the technical lead in charge of the globalization of the upcoming Windows Vista OS.

If you’ve got a half hour to spare, it’s worth a watch. It gets a bit techie at times but there are some grest nuggets of wisdom for anyone involved in software or Web globalization.

If you don’t have the time, here are a few items that jumped out at me…

-> Vista is being localized into roughly 100 languages (some partially) — this is, as I understand, about twice the number of languages that were supported by Windows XP. By the way, this blows away the number of languages support by Mac.

-> Microsoft is “opening it up” and “getting out of the way” — which means that they know that they won’t be able to localize Windows into a thousand languages anytime soon, so they are working to create the tools to allow folks around the world to customize Windows to their languages and cultures. I’m glad to see Microsoft doing this — Michael introduced a nifty keyboard tool that you can use to create your own keyboard layouts. Very nice.

-> Vista will support roughly 200 locales. This is a big increase from XP. A locale includes such elements as language, date format, currency format, etc.

-> “You can’t know everything” — is Michael’s advice to other world-be internationalization engineers. So true. This is one thing I really love about this field — there are just too many languages and cultural nuances for anyone to know it all. It means that we’re always learning something new and that teamwork is essential to success.

-> Get to know Unicode. Unicode came up several times during the interview. Microsoft was an early promoter of Unicode and Unicode truly has revolutionized global software development. The last remaining non-Unicode area on the Internet is the DNS — which engineers are grappling with as we speak.

Anyway, it’s a great interview. Check it out.

Michael also has a blog

Unicode Is Really Getting Fashionable

Michael Kaplan called my attention the latest Unicode fashion accessory:


For those who don’t get it, that funny little question-mark character is what Mac users see when their computers don’t have the right font to display a given character (or if the Web browser gets a bit confused about what font to display). Just because Unicode allows you to display the world’s major languages on a Web page does not guarantee that your Web users have the right fonts on their end.

Windows users see blank boxes — and yes there’s a shirt for Windows users as well:


The shirts are available from Cafe Press. I wonder if I can get one in black…

Unicode In A NutShell

Here is a great article about Unicode and how it affects Web developers and programmers. Here’s an excerpt:

The Single Most Important Fact About Encodings

If you completely forget everything I just explained, please remember one extremely important fact. It does not make sense to have a string without knowing what encoding it uses. You can no longer stick your head in the sand and pretend that “plain” text is ASCII.

There Ain’t No Such Thing As Plain Text.

If you have a string, in memory, in a file, or in an email message, you have to know what encoding it is in or you cannot interpret it or display it to users correctly.

Almost every stupid “my website looks like gibberish” or “she can’t read my emails when I use accents” problem comes down to one naive programmer who didn’t understand the simple fact that if you don’t tell me whether a particular string is encoded using UTF-8 or ASCII or ISO 8859-1 (Latin 1) or Windows 1252 (Western European), you simply cannot display it correctly or even figure out where it ends. There are over a hundred encodings and above code point 127, all bets are off.

For the article, go to:

HTML and Unicode

Another useful Web globalization Q&A has been added to the W3C site. It has to do with the issue of character sets and encodings. I can’t even begin to describe how confusing this issue can be to Web developers as they begin tackling new languages — and new scripts. But is is something they will encounter more frequently. Fortunately, we now have Unicode.


What is the ‘Document Character Set’ for XML and HTML, and how does it relate to the encodings I use for my documents?

For the answer, go to: