Internationalizing PHP Applications With TMX

phpMagazin recently did a good job of covering Zend_Translate; specifically what it is, how it works and four of the basic translation adapters; those being Array, CSV, Gettext and Ini. However, what they didn’t cover was what I think, is one of the best choices of adapters that Zend_Translate has; which is TMX.

Background on Zend_Translate

If you’re not familiar with Zend_Translate, here’s a quick overview; Zend_Translate allows you to quickly and effectively implement Internationalisation (I18N) in your web applications. Zend_Translate provides the following key features:

Multiple source formats (or adapters)
Thread-safe gettext support
An easy and generic api
Detection of the users language
Automatic source detection

But for remainder of this article, I’m going to focus on the multiple source adapters.

Translation Adapters

If you’re not familiar with them, adapters are what store the website content in your chosen base language and in a translated version for one, or several, desired languages, that you believe your website visitors are keen to view your site in. These adapters come in a variety of forms, from the simplest ones, being a CSV file and a PHP array, to the more complex and esoteric ones, being TBX and TMX.

Using a PHP Array

$english = array(
    'message1' => 'message1',
    'message2' => 'message2',
    'message3' => 'message3'
);

$german = array(
    'message1' => 'Nachricht1',
    'message2' => 'Nachricht2',
    'message3' => 'Nachricht3'
);

$translate = new Zend_Translate(
    array(
        'adapter' => 'array',
        'content' => $english,
        'locale'  => 'en'
    )
);

$translate->addTranslation(
  array('content' => $german, 'locale' => 'de')
);

Using a CSV File

Example CSV File

message1;Nachricht1
message2;Nachricht2

$translate = new Zend_Translate(
    array(
        'adapter' => 'csv',
        'content' => '/path/to/mytranslation.csv',
        'locale'  => 'de'
    )
);

$translate->addTranslation(
    array(
        'content' => 'path/to/other.csv',
        'locale' => 'fr'
    )
);

You can see that in both examples, the initial content, in English, was created and used to initialise the Zend_Translate object and that, after that, the additional translations were added to the adapter with the addTranslation method.

Given this, it becomes, relatively painless to add as many sets of translation information as you need in a consistent and predictable manner.

Displaying View Content

Once you’ve made your choice of adapter and you’ve created and loaded the required source files, actually using it in a Zend Framework project, to display internationalised content, becomes very simple.

I’ll assume that you’re following the standard Zend Framework approach and your view scripts are responsible for outputting your user interface text, whether that’s a menu, footer, copyright notice or something else, and it’s not managed in code.

Given that, you would, normally, do something like the following:

print $this->escape('copyrightNotice');.

Right? Well, once the translation adapter is setup, only a small change is required to implement internationalisation. Your view script above now becomes the following:

print $this->translate(escape('copyrightNotice'));

Now I don’t want to get in to any kind of security debate here, we’re just focussing on the key elements of translation and internationalisation – right?

How do you determine the language choice

Now that you’ve got your source files and view configured to translate your content, all that remains is for the user to choose their language. This is done through a variety of methods, but the one that I find simplest is through their letting the users browser’s language option set it automatically.

Depending on what they’ve got it set to, your website will display content in that language if it’s available, or fall back to your default language. You don’t have to do anything fancy, just let the browser specify the choice and away you go. To do this, when you configure the Zend_Translate resource, say in your bootstrap file, you’d set the option to AUTO as in the following code.

If you want to know more about Zend_Translate, there is an excellent series of video casts over at ZendCasts.com.

Quick Plug: Jon’s* video casts are always timely and accurate, presented in a clear and unambiguously manner each week. Tell him Matt from Malt Blue said hi.*

So What is TMX

TMX (or Translation Memory eXchange) is one of the more advanced source adapters for Zend_Translate. Based on XML, TMX is an international standard for storing and delivering translation information. According to Wikipedia, TMX:

TMX (Translation Memory eXchange) is the vendor-neutral open XML standard for the exchange of Translation Memory (TM) data created by Computer Aided Translation (CAT) and localization tools. The purpose of TMX is to allow* easier exchange of translation memory data between tools and/or translation vendors with little or no loss of critical data during the process*.

It is a bit unwieldy at first, but if you’re familiar with XML, you’ll get a grasp on it quite quickly. Have a look at the sample below and I’ll take you through the key points.

<?xml version="1.0" ?>
<!DOCTYPE tmx SYSTEM "tmx14.dtd">
<tmx version="1.4">
   <header creationtoolversion="1.0.0" datatype="winres" segtype="sentence"
           adminlang="en-us" srclang="de-at" o-tmf="abc"
           creationtool="XYZTool" >
   </header>
   <body>
       <tu tuid='message1'>
           <tuv xml:lang="de"><seg>Nachricht1</seg></tuv>
           <tuv xml:lang="en"><seg>message1</seg></tuv>
       </tu>
       <tu tuid='message2'>
           <tuv xml:lang="de"><seg>Nachricht2</seg></tuv>
           <tuv xml:lang="en"><seg>message2</seg></tuv>
       </tu>
   </body>
</tmx>

$translate = new Zend_Translate(
    array(
        'adapter' => 'tmx',
        'content' => 'path/to/mytranslation.tmx',
        'locale'  => 'en'
    )
);

What’s it all mean

Well, there is a bit to it and you can find a host of links in the links section. For the purposes of this post, I’m sticking to some of the more regularly used elements.

There are two sections to the document: header and body and the file is always written out in Unicode format. The header is metadata about the file and the body is where the translation information is stored.

The Header

In the header section, delimited by

, any number of elements can be defined. There are four mandatory attributes^[1]:

CREATIONTOOL - The tool or person that created the file. This can be a standard vendor string, or something that you make up yourself.
SEGTYPE - This defines the usage of the TU element. The options are paragraph, sentence and phrase.
O-TMF - The format from which the file’s been generated.
DATATYPE - Specifies the type of data of an element.

The Body

In the body as delimited by , there are one or more , or translation unit, elements which contain one or more , translation unit variant, elements - which contain the translation text. You don’t need to have more than one, but it doesn’t really make sense to have just one.

The first element is meant to contain the base language text and any that follow contain the translations in to the other languages. only has one mandatory element, and a series of optional ones.

Other elements

VERSION - The version of TMX that the file complies with
USAGECOUNT - The number of times the elements used
SRCLANG - The locale of the source text
LANG - The locale of the elements data
CHANGEDATE - When the file was last changed
CREATIONDATE - When the file was created

That’s the base structure of the file. Not much too it hey?!

Why’s it Worth Using?

Easy to Read

TMX has a number of good points going for it. It’s based on XML, so you can build on your familiarity there and very rapidly understand it. It’s reasonably straight-forward to read, even if you’re not that familiar with it or XML. Everything is rather logically, and cleanly laid out.

Cross-platform

You can make amends to it with no more special software than Notepad, TextEdit or your favorite IDE of choice. Being textual, not binary, in nature, it’s instantly cross-platform. If you develop on Windows or Mac and deploy on Linux – no problem there.

Highly Compressible

Being text data, it can be quickly compressed using any good compression algorithm, whether zip, gzip, bzip2 or whatever. There is a host of good software available for editing and validating it.

Have a look at the following list:

Whether open-source or commercial, you can acquire something to help you out. But if you’re comfortable, just use a standard text editor – just don’t use any kind of rich text editor, like MS Word, which will add in a lot of unnecessary and invalid markup.

Coders Not Required

Given that there’s management software for it, you don’t need developers to manage it. Often times, developers will not be your translators. This task will be given to specialist translators that may come from outside agencies, be friends, or be other members of the team that are not versed well, or at all in coding – some even run at the sight of it as we all know.

Translations Separate from Code

The last point is that you keep the translation information outside of code. The code should not be coupled in any strong manner with translation information. They’re two completely separate things. You’re code and the translation information can and should change independently.

Caching it up baby

However, if speed is all that you’re worrying about, I’d suggest that TMX is worthy of consideration for you. According to the documentation, once source data has been read in and cached, the process of retrieving the information when it needs to be used is, generally, on par, regardless of the source adapter used.

With any project, you’re going to use some caching to reduce the impact of bottlenecks on performance – Zend_Translate with the TMX adapter is no different. Using caching is a bit outside the scope of this article, so I’m going to leave that for another post. But rest assured that if speed is your only worry, it need not be.

Internationalizing PHP Applications With TMX

Translation Adapters

Using a PHP Array

Using a CSV File

Example CSV File

Displaying View Content

How do you determine the language choice

So What is TMX

What’s it all mean

The Header

The Body

Other elements

Why’s it Worth Using?

Easy to Read

Cross-platform

Highly Compressible

Coders Not Required

Translations Separate from Code

Caching it up baby

Further Reading

You might also be interested in these tutorials too...

The Zend Framework Bootstrap made simple (wrap up)

Rename uploaded files with Zend Framework

The Zend Framework Bootstrap made simple (Part 3)

The Zend Framework Bootstrap made simple (Part 2)

Join the discussion

Did you enjoy the tutorial?

Internationalizing PHP Applications With TMX

Translation Adapters

Using a PHP Array

Using a CSV File

Example CSV File

Displaying View Content

How do you determine the language choice

So What is TMX

What’s it all mean

The Header

You might also be interested in these tutorials too...

The Zend Framework Bootstrap made simple (wrap up)

Rename uploaded files with Zend Framework

The Zend Framework Bootstrap made simple (Part 3)

The Zend Framework Bootstrap made simple (Part 2)

Want more tutorials like this?

Join the discussion

Did you enjoy the tutorial?