Idiom Tags Along

To succeed in software these days, you need really tight alliances and Idiom’s “go it alone” approach over the years has cost it dearly. The latest warning flag came in the form of a press release from Idiom issued today. Here is the lead sentence:

Globalization Management Systems (GMS) leader, Idiom Technologies, Inc. today announced WorldServer(TM) support for Documentum 5, the latest version of Documentum’s leading enterprise content management platform.

The reason this is such an odd release is because on this very same day Documentum also issued a press release, a release that appears almost at odds with Idiom’s press release. Here’s the lead from the Documentum press release:

Documentum (Nasdaq: DCTM – News), the leader in enterprise content management (ECM), today announced a new joint offering with TRADOS, the global leader in language technology, and Lionbridge (Nasdaq: LIOX – News), a leading provider of globalization and testing services. Extending existing partnerships, the three companies have worked together to integrate the TRADOS Language Server(TM) with Documentum 5, the latest version of the company’s ECM platform. With this announcement, Documentum becomes the first and only content management vendor to integrate key language technology directly into its content management platform.

So here is Documentum announcing a “first” and Idiom announcing what appears to be pretty much the same thing. Very mysterious. The interesting thing about the Idiom press release is that there is no boilerplate quote from Documentum in it. In fact, I doubt Documentum had anything to do with that release.

I hate to be so cynical, but I just don’t believe Idiom will be around this time next year, at least not as a standalone company. After all, even Documentum is no longer a standalone company. EMC Software recently acquired them.

Unicode In A NutShell

Here is a great article about Unicode and how it affects Web developers and programmers. Here’s an excerpt:

The Single Most Important Fact About Encodings

If you completely forget everything I just explained, please remember one extremely important fact. It does not make sense to have a string without knowing what encoding it uses. You can no longer stick your head in the sand and pretend that “plain” text is ASCII.

There Ain’t No Such Thing As Plain Text.

If you have a string, in memory, in a file, or in an email message, you have to know what encoding it is in or you cannot interpret it or display it to users correctly.

Almost every stupid “my website looks like gibberish” or “she can’t read my emails when I use accents” problem comes down to one naive programmer who didn’t understand the simple fact that if you don’t tell me whether a particular string is encoded using UTF-8 or ASCII or ISO 8859-1 (Latin 1) or Windows 1252 (Western European), you simply cannot display it correctly or even figure out where it ends. There are over a hundred encodings and above code point 127, all bets are off.

For the article, go to: http://www.joelonsoftware.com/articles/Unicode.html

Profiles in Unicode

The New York Times has a great profile of Michael Everson, one of the many architects of Unicode. I hope some day there are more profiles like this about the many people devoting great chunks of their lives to the Unicode cause, people like Mark Davis, Asmus Freytag, and John Jenkins (to name just a few).

The article did a good job of describing Unicode — something I find extremely hard to describe:

A more technical explanation of Unicode is this: When Mr. Everson sends e-mail in ogham, his computer isn’t sending ogham letters through the ether. Instead, strings of 0’s and 1’s are transmitted, and when they arrive on a friend’s computer, they generate on its screen the same ogham letters that Mr. Everson typed. Unicode is the master list that resides in both computers and translates individual letters and symbols into strings of 0’s and 1’s and back again. Most current software is Unicode-compliant, which means that this master list of all the world’s writing systems has been built into operating systems, browsers and software.

Even though Unicode includes more than 50 different writing systems. it is far complete. I was surprised to learn that there are nearly 100 more writing systems left to be included, which means that we will likely not be around to see Unicode completed. I guess it’s kinda like Boston’s Big Dig.

Want to learn more about Unicode? Visit the Web site.