Internationalizing PHP Applications With TMX
phpMagazin recently did a good job of covering Zend_Translate; specifically what it is, how it works and four of the basic translation adapters; those being Array, CSV, Gettext and Ini. However, what they didn’t cover was what I think, is one of the best choices of adapters that Zend_Translate has; which is TMX.
Background on Zend_Translate
If you’re not familiar with Zend_Translate, here’s a quick overview; Zend_Translate allows you to quickly and effectively implement Internationalisation (I18N) in your web applications. Zend_Translate provides the following key features:
- Multiple source formats (or adapters)
- Thread-safe gettext support
- An easy and generic api
- Detection of the users language
- Automatic source detection
But for remainder of this article, I’m going to focus on the multiple source adapters.
If you’re not familiar with them, adapters are what store the website content in your chosen base language and in a translated version for one, or several, desired languages, that you believe your website visitors are keen to view your site in. These adapters come in a variety of forms, from the simplest ones, being a CSV file and a PHP array, to the more complex and esoteric ones, being TBX and TMX.
Using a PHP Array
$english = array( 'message1' => 'message1', 'message2' => 'message2', 'message3' => 'message3' ); $german = array( 'message1' => 'Nachricht1', 'message2' => 'Nachricht2', 'message3' => 'Nachricht3' ); $translate = new Zend_Translate( array( 'adapter' => 'array', 'content' => $english, 'locale' => 'en' ) ); $translate->addTranslation( array('content' => $german, 'locale' => 'de') );
Using a CSV File
Example CSV File
message1;Nachricht1 message2;Nachricht2 $translate = new Zend_Translate( array( 'adapter' => 'csv', 'content' => '/path/to/mytranslation.csv', 'locale' => 'de' ) ); $translate->addTranslation( array( 'content' => 'path/to/other.csv', 'locale' => 'fr' ) );
You can see that in both examples, the initial content, in English, was created and used to initialise the Zend_Translate object and that, after that, the additional translations were added to the adapter with the addTranslation method.
Given this, it becomes, relatively painless to add as many sets of translation information as you need in a consistent and predictable manner.
Displaying View Content
Once you’ve made your choice of adapter and you’ve created and loaded the required source files, actually using it in a Zend Framework project, to display internationalised content, becomes very simple.
I’ll assume that you’re following the standard Zend Framework approach and your view scripts are responsible for outputting your user interface text, whether that’s a menu, footer, copyright notice or something else, and it’s not managed in code.
Given that, you would, normally, do something like the following:
Right? Well, once the translation adapter is setup, only a small change is required to implement internationalisation. Your view script above now becomes the following:
Now I don’t want to get in to any kind of security debate here, we’re just focussing on the key elements of translation and internationalisation – right?
How do you determine the language choice
Now that you’ve got your source files and view configured to translate your content, all that remains is for the user to choose their language. This is done through a variety of methods, but the one that I find simplest is through their letting the users browser’s language option set it automatically.
Depending on what they’ve got it set to, your website will display content in that language if it’s available, or fall back to your default language. You don’t have to do anything fancy, just let the browser specify the choice and away you go. To do this, when you configure the Zend_Translate resource, say in your bootstrap file, you’d set the option to AUTO as in the following code.
Quick Plug: Jon’s* video casts are always timely and accurate, presented in a clear and unambiguously manner each week. Tell him Matt from Malt Blue said hi.*
So What is TMX
TMX (or Translation Memory eXchange) is one of the more advanced source adapters for Zend_Translate. Based on XML, TMX is an international standard for storing and delivering translation information. According to Wikipedia, TMX:
TMX (Translation Memory eXchange) is the vendor-neutral open XML standard for the exchange of Translation Memory (TM) data created by Computer Aided Translation (CAT) and localization tools. The purpose of TMX is to allow* easier exchange of translation memory data between tools and/or translation vendors with little or no loss of critical data during the process*.
It is a bit unwieldy at first, but if you’re familiar with XML, you’ll get a grasp on it quite quickly. Have a look at the sample below and I’ll take you through the key points.
<?xml version="1.0" ?> <!DOCTYPE tmx SYSTEM "tmx14.dtd"> <tmx version="1.4"> <header creationtoolversion="1.0.0" datatype="winres" segtype="sentence" adminlang="en-us" srclang="de-at" o-tmf="abc" creationtool="XYZTool" > </header> <body> <tu tuid='message1'> <tuv xml:lang="de"><seg>Nachricht1</seg></tuv> <tuv xml:lang="en"><seg>message1</seg></tuv> </tu> <tu tuid='message2'> <tuv xml:lang="de"><seg>Nachricht2</seg></tuv> <tuv xml:lang="en"><seg>message2</seg></tuv> </tu> </body> </tmx>
$translate = new Zend_Translate( array( 'adapter' => 'tmx', 'content' => 'path/to/mytranslation.tmx', 'locale' => 'en' ) );
What’s it all mean
Well, there is a bit to it and you can find a host of links in the links section. For the purposes of this post, I’m sticking to some of the more regularly used elements.
There are two sections to the document: header and body and the file is always written out in Unicode format. The header is metadata about the file and the body is where the translation information is stored.
In the header section, delimited by , any number of elements can be defined. There are four mandatory attributes:
- CREATIONTOOL - The tool or person that created the file. This can be a standard vendor string, or something that you make up yourself.
- SEGTYPE - This defines the usage of the TU element. The options are paragraph, sentence and phrase.
- O-TMF - The format from which the file’s been generated.
- DATATYPE - Specifies the type of data of an element.
In the body as delimited by , there are one or more , or translation unit, elements which contain one or more , translation unit variant, elements - which contain the translation text. You don’t need to have more than one, but it doesn’t really make sense to have just one.
The first element is meant to contain the base language text and any that follow contain the translations in to the other languages. only has one mandatory element, and a series of optional ones.
- VERSION - The version of TMX that the file complies with
- USAGECOUNT - The number of times the elements used
- SRCLANG - The locale of the source text
- LANG - The locale of the elements data
- CHANGEDATE - When the file was last changed
- CREATIONDATE - When the file was created
That’s the base structure of the file. Not much too it hey?!
Why’s it Worth Using?
Easy to Read
TMX has a number of good points going for it. It’s based on XML, so you can build on your familiarity there and very rapidly understand it. It’s reasonably straight-forward to read, even if you’re not that familiar with it or XML. Everything is rather logically, and cleanly laid out.
You can make amends to it with no more special software than Notepad, TextEdit or your favorite IDE of choice. Being textual, not binary, in nature, it’s instantly cross-platform. If you develop on Windows or Mac and deploy on Linux – no problem there.
Being text data, it can be quickly compressed using any good compression algorithm, whether zip, gzip, bzip2 or whatever. There is a host of good software available for editing and validating it.
Have a look at the following list:
Whether open-source or commercial, you can acquire something to help you out. But if you’re comfortable, just use a standard text editor – just don’t use any kind of rich text editor, like MS Word, which will add in a lot of unnecessary and invalid markup.
Coders Not Required
Given that there’s management software for it, you don’t need developers to manage it. Often times, developers will not be your translators. This task will be given to specialist translators that may come from outside agencies, be friends, or be other members of the team that are not versed well, or at all in coding – some even run at the sight of it as we all know.
Translations Separate from Code
The last point is that you keep the translation information outside of code. The code should not be coupled in any strong manner with translation information. They’re two completely separate things. You’re code and the translation information can and should change independently.
Caching it up baby
However, if speed is all that you’re worrying about, I’d suggest that TMX is worthy of consideration for you. According to the documentation, once source data has been read in and cached, the process of retrieving the information when it needs to be used is, generally, on par, regardless of the source adapter used.
With any project, you’re going to use some caching to reduce the impact of bottlenecks on performance – Zend_Translate with the TMX adapter is no different. Using caching is a bit outside the scope of this article, so I’m going to leave that for another post. But rest assured that if speed is your only worry, it need not be.
If you want any further information on Zend_Translate, or TMX, have a look at the following links.
- Translation Memory eXchange
- TMX Format (Overview)
- XML in localisation: Reuse translations with TM and TMX
- Heartsome TMX Editor
- TMX Compliance Kit
- Tools for TMX
You might also be interested in...
- The Zend Framework Bootstrap made simple (wrap up)
- Rename uploaded files with Zend Framework
- The Zend Framework Bootstrap made simple (Part 3)
- The Zend Framework Bootstrap made simple (Part 2)
- The Zend Framework Bootstrap Made Simple (Part 1)
comments powered by Disqus