There was some mighty big news made today — mighty big if you’re a globalization geek — the fifth iteration of Unicode was officially launched.
Says the press release: “The Unicode Consortium announces the release of a significant update of its widely-used Unicode Character Database (UCD). The new version, Version 5.0, defines more than 99,000 characters for the languages of the world, and provides the detailed properties needed for computer software implementations. This latest level of the UCD contains all the information needed to update software to support the characters and algorithms that are the foundation for all modern computer programs — including the latest data for Unicode security mechanisms, collation, and locales.”
A print version of the standard is forthcoming. I have version 3.0, which weighs in at more than a thousand pages; I can only imagine how big the 5.0 book will be. Actually, if you want to get a true feel for the significance of Unicode, you really need to get the book. I got such a kick out of browsing through all those characters from all those languages that I don’t speak. It puts little ol’ English in perspective. It’s an impressive achievment.
At this point it seems the improvements to Unicode are more about wiring and plumbing than simple character additions. Fewer than 2,000 characters were added this time around. But those characters do represent five new scripts: Balinese, N’Ko, Phags-pa, Phoenician, and Sumero-Akkadian Cuneiform.
I’m still in awe of Unicode and the people who developed it. Thanks to Unicode we can post multiple scripts on one Web page (whether or not they all display properly is another issue). Thanks to Unicode, a global company can purchase one content management system and, assuming it supports Unicode, allow all of the offices to contribute content, in nearly any language.
One application; many languages.
When I got into this field in 1999, creating a Japanese-language Web page required purchasing the Japanese OS of Windows, for starters. Those were the dark ages indeed. Thanks to Unicode, so many of the technical hurdles are gone, allowing people to simply communicate.
You can read all the details of 5.0 here.