Multilingual ExpressionEngine® (ver 2), Part 1

This is the first of three articles which will focus on setting up ExpressionEngine as a multilingual content delivery system. As with any multilingual project, there are a lot of moving parts and things to consider, especially in the planning phase. Whether or not you have the human infrastructure in place to maintain a multilingual site is also very important.

These articles however, will cover more of the technical aspects of how it can be applied to an ExpressionEngine build. Furthermore, the methods described in these articles take a “do-it-yourself” approach, without using module based frameworks like Transcribe or Publisher. I will discuss these turn-key solutions in the second instalment as they are very good solutions as well, but like any framework they require you to follow a specific content design pattern, which in some cases may not fit your project. So without further ado, let’s get down to the nitty gritty.

What Needs to be Translated and How?

The first step you need to take in any multilingual development project is to take stock of your content and figure out what needs to be translated, and how it should be translated. From an ExpressionEngine standpoint, I like to divide content for multilingual sites into the following groups:

  1. Channel Content
  2. Static Content
  3. Locale Formatting
  4. Categories
  5. Navigation
  6. URL Structures

These are the content types in EE that need to be filtered somehow on a per-language basis. The next step is to look at our content, and decide how it will be translated. By “how” I am referring to the following two methods:

One-to-One Translations

These are translations that must exist on a one-to-one basis. That is, if it exists in one language, it must exist in any of the other languages the site supports. To help visualise this, I like to think of them as one-to-one relationships in a relational database schema. The “submit” buttons in our site are a good example of this. We will need them translated into all languages regardless of the sites structure, as they could potentially appear on any page or section regardless of language.

One-to-None Translations

In contrast to the One-to-One translations, these type of translations do not need to (or cannot) be linked directly with a translated counter-part. It could be that your sites structure and content may differ depending on the users language, or the content will be translated to different languages in stages by your clients team. An example of this could be a “products” channel in an e-commerce setup where your client wants to be able to include new products in the storefront as soon as possible, but doesn’t want to wait for the translated product descriptions in the other four languages the store supports before they can start selling.

So now that we have an idea of how to categorise our content and how it could be translated, we will go over each of the content types in this series and show you how they can be implemented. But first, we have to talk about the one thing that holds this all together …

The Lynchpin

Regardless of the content and how it is translated, we need one small bit of information that will allow us to switch between languages on our site, and to maintain “state” as our users click through the site in the language they have selected.

This variable usually consists of a two-character ISO language code (i.e.; “es” for Spanish, or “de” for German) which we will use to display the correct translation of the content. I say “usually”, as sometimes projects require a more granular approach and also depend on locales (i.e.; “es_es” for Castilian Spanish and “es_mx” for the Mexican dialect).

Making this global language key available to our EE templates can be achieved in different ways on a PHP level with a session variable, cookies, or as a part of the URL. I’m going to explain the latter, as it is an easier way of differentiating languages in a multilingual site, plus it has the benefit of making URLs easily sharable in the language the user intends to share it in.

Side Note: Initially you may be thinking that doing an IP-to-Nation lookup would be an ideal way of getting the users language, and maintaining it in scope. However, as an english speaker living abroad, I assure you this can be extremely frustrating from a usability standpoint. In my opinion, the language should always be a user’s selection, and you are assuming way too much by automatically assigning it based on the users geographical location alone.

The easy way of getting our language into the URL without going the straight up PHP route, is to take advantage of sub-directories and using multiple index.php files. First we will create empty sub-directories in our web root for each language naming them by their two letter ISO language code:

Directory structure

Our example site will support English (en), Spanish (es), and Dutch (nl)

Once we have these directories created, let’s duplicate our main index.php file and place a copy of it in each newly created directory. Since the index.php file is the first routing element in ExpressionEngine, we will use it to set our global language variables so they will be available thereafter.

Open up each copy of the index.php file, and in the “Custom Config Values” section (around line 70 or so) include the following global variables for the corresponding language (example for the index.php file in our Spanish “es” directory):

$assign_to_config['global_vars']['lang_code'] = 'es';
$assign_to_config['global_vars']['lang_name'] = 'spanish';
$assign_to_config['global_vars']['lang_date'] = '%d/%m/%Y';
$assign_to_config['global_vars']['lang_alias'] = 'Español';

The “lang_code” variable is what we will use throughout our templates to get the correct translations of our content. The “lang_name” variable is just the name of the language in your primary language (in this case english). The “lang_date” variable will be used to correctly format short dates in the users locale. And finally, the “language_alias” is the language’s name, in its own language. This can be handy to have at the ready and is better than representing a language by a tacky little flag, which by the way, is another faux-pas in multilingual website UX. Remember to do this for your primary language as well (i.e.; english).

Now in our templates we can simply use the global variable tag {lang_code} anywhere, and know that it will be parsed and at the ready wherever we need it. We’re almost done here but there is one more thing we need to take into account before we jump into setting up channels and whatnot.

If you are using an .htaccess file to remove the “index.php” part of your URLs (and you should be), you will need to create separate .htaccess files for each of your language sub-directories. Depending on how you are removing index.php from your URL, you’ll just need to modify the end result of the rewrite rule to take into account the directory it’s self. So keeping with our Spanish example, it’s .htaccess file would include the following rewrite rule:

RewriteEngine On 
RewriteCond $1 !\.(gif|jpe?g|png)$ [NC]
RewriteCond %{REQUEST_FILENAME} !-f 
RewriteCond %{REQUEST_FILENAME} !-d 
RewriteRule ^(.*)$ /es/index.php/$1 [L]

Note the “/es/“ prepending the “index.php” portion of the rule. This just points to our index.php file in the spanish directory and makes sure the URL will resolve to it. So now we have our global language variables in place, let’s jump into our first and primary content type.

Channel Content

As discussed in the previous section, there are two ways we can translate content. We are going to look at both methods and how to apply them in the context of channel content.

One-to-One Channel Content

The basis of this method is that we will have a corresponding translation for each language, so  one way to implement this is to create a single channel for our content, and have it use a custom field group with multiple entry fields for each bit of translatable data, one for each language. As an example, let’s go back to our fictitious multilingual site and imagine we need to include an “events” channel to display information about upcoming events in all of the three languages our site supports.

In this example we are assuming that content must to be translated into all of the supported languages. By using this method we can enforce this in the backend channel edit form by supplying text fields for each language, making it visually apparent that all the language data must be filled in. First we create one channel for these events, naming it aptly “events”.

Channel Name Channel Short Name
Events events

We then create the “events” custom field group which it will use, and then supply custom fields for texts in each of the languages our site supports. To retrieve this information later on in our templates we will need to use a naming convention for these text fields which will indicate which language it applies to. An example could be:

Field Title Field Short Name Field Type
Event Start Date cf_event_date_start Date Field
Event End Date cf_event_date_end Date Field
Event Name (EN) cf_event_name_en Text Input
Event Name (ES) cf_event_name_es Text Input
Event Name (NL) cf_event_name_nl Text Input
Event Description (EN) cf_event_desc_en Rich Text Editor
Event Description (ES) cf_event_desc_es Rich Text Editor
Event Description (NL) cf_event_desc_nl Rich Text Editor

So looking at our fields in the table above, the event’s start and end dates are handled by date fields. Since this information is universal to all languages, there is no need to worry about translations here, just the format. However, the events name and description do need to be translated so we create a separate field for each language using the appropriate field types.

Here is where our naming convention comes into play. Note the “_en”, “_es”, “_nl” suffixes for each of the text based field’s short names. By using a naming convention like this, we can then easily use our global {lang_code} variable to grab the correct piece of text in our templates, as well as our {lang_date} variable to format the dates correctly.

{exp:channel:entries channel="events" dynamic="no"}
  <h4>{cf_event_name_{lang_code}}</h4>
  <h5>
    {cf_event_date_start format="{lang_date}"}
    -
    {cf_event_date_end format="{lang_date}"}
  </h5>
  <div class="summary">
    {cf_event_desc_{lang_code}}
  </div>
{/exp:channel:entries}

There are a few things to keep in mind with this approach, and the first is scalability. If you are reasonably sure that you site will only support four languages or less, and the type of content warrants a one-to-one translation, then this approach would be fine. If however you know you will be supporting more than four languages, then this approach may become hard to scale and maintain.

Another thing to keep in mind is keeping the channel edit forms easy to understand to your content editors. Having multiple fields for each piece of text can be confusing no matter how well they are labeled.

One-to-None Channel Content

As I explained in the beginning of this article, there may be situations where a one-to-one translation just isn’t ideal. Take our previous one-to-one channel translation above. What were to happen if instead, events were dependant on language such as speaking events or conferences targeted at a specific language speaking group? You could get around this by placing conditionals within your template that checked for the existence of content in the users language like so :

{exp:channel:entries channel="events" dynamic="no"}
  {if '{cf_event_name_{lang_code}} != ''}
    <h4>{cf_event_name_{lang_code}}</h4>
    <h5>
      {cf_event_date_start format="{lang_date}"}
      -
      {cf_event_date_end format="{lang_date}"}
    </h5>
    <div class="summary">
      {cf_event_desc_{lang_code}}
    </div>
  {/if}
{/exp:channel:entries}

This would work, however it could be error prone on the control panel end of things. Namely you could no longer set the “cf_event_name_XX” fields as required in the editing form. Things could also get hard to maintain further down the road if your data gets more complex in your templates. Not to mention, we now have to take into account how ‘simple’ and ‘advanced’ conditionals are parsed in EE if we do get more complex, and have to nest other conditionals within them.

The first step in creating a “One-to-None” translation setup in our channels, is to create a separate channel for each language and assign all of them the same custom field group. The naming convention is instead applied to the channel name rather than the custom fields.

Channel Name Channel Short Name
Events (EN) events_en
Events (ES) events_es
Events (NL) events_nl

Again, we are using the two character language ISO code in our naming convention, however this time applied to our multiple channels. As such we can now use one custom field group for all three channels. The custom field group would now be something like the following.

Field Title Field Short Name Field Type
Event Start Date cf_event_date_start Date Field
Event End Date cf_event_date_end Date Field
Event Description cf_event_desc Rich Text Editor

Now we can set any of our required fields to make sure content editors and translators are including any required information. This also removes the confusion multiple text fields can create in the channel entry edit form.

Going back to our template code, since we have moved the naming convention from the custom field short names to the channel short name, to get the correct translated content our code would look something like this:

{exp:channel:entries channel="events_{lang_code}" dynamic="no"}
  <h4>{title}</h4>
  <h5>
    {cf_event_date_start format="{lang_date}"}
    -
    {cf_event_date_end format="{lang_date}"}
  </h5>
  <div class="summary">
    {cf_event_desc}
  </div>
{/exp:channel:entries}

The channel name in our channel entries tag now determines which language to display. We are still using our {lang_date} variable to make sure the dates are formatted correctly, but now we can call the custom fields without worrying about the language it pertains to. In the example above we have also eliminated the {cf_event_name_XX} fields completely as the normal {title} field will work just fine here.

Stay tuned for the next episodes

In this first instalment of the series we have gone over how to categorise our content and the types of translations we can do. We have setup our global language variables, and have applied both translation methods to channel contents in two simple examples. In the second part we will cover static translations, categories, navigation, URLs, and third-party translation add-on systems. Our third and final chapter will cover preparing the ExpressionEngine control panel for multiple language editors and translators.