News:

Simutrans Wiki Manual
The official on-line manual for Simutrans. Read and contribute.

Translation of Simutrans - time to move to gettext?

Started by sanna, June 16, 2010, 10:26:52 AM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

sanna

Currently Simutrans is translated with the help of SimuTranslator, a tool that Frank does a wonderful job in keeping as user/translator friendly as possible. But, there is imho a fundamental flaw in the principle used to translate Simutrans. All translated strings are translation of set keys, keys that does not need to impart any useful information (an example is LOCO_INFO). If the translation of a key is missing, the key itself is shown. If the information that the key needs to convey is changed, already translated strings keep on showing the old (now possibly outdated) information until a translator reviews the text and updates it. Furthermore, there is no indication given to the translator that such an update is needed.

Newly added strings clearly have a tendency to be useful even if not translated, but far from all have. There is a very real fear that changes to the translation system will cause loss of existing translations that might take a very long time to be re-established. Nevertheless, this is a problem that will not go away, and I believe that something needs to be done. Simutrans is a very international game, and I think it is very important that effort is put into keeping it so.

There is an alternative; gettext; it has been mentioned before here on the forum. Prissi has stated if the project had been started today, gettext could have been chosen to handle translations (see http://forum.simutrans.com/index.php?topic=4838.msg47661#msg47661). This system which is used by a great many open source projects works on a different principle. Instead of using specific keys; one language is chosen (this is usually English) as the base language. The entire string in the base language is then used as the "key". If a language lacks a translation to this string, the base language string is used. If the base language string is changed, existing translations of it will be marked as "fuzzy", showing the translator that a review is needed. Until a translator has reviewed the string, the base language string is used (unless the devs choose to use fuzzy string in game as well - which I would not recommend).

Since gettext uses the entire string as key, identical strings in various places will be translated identically, unless specifically set to be translated differently. There are two ways to specify this; by namespacing (each pakset could f ex have a separate namespace) and by prefixing the string with an untranslated selected string sequence; this would only be necessary for those strings that in the base language are identical, but that conveys different meanings (think Game as in "play a game" and as in "hunt game"), both which are valid within the same namespace. Further more gettext supports basic plural handling, allowing translators to translate a given string into as many plural forms as needed in their language. Grammatical gender support is however not very good/non-existent.

There are translator tools for both linux and windows for gettext (f ex poedit - http://www.poedit.net/) as well as online solutions (f ex launchpad - https://launchpad.net/) which allows translators to easily get an overview of what needs to be done. Translations are then sent to or harvested by a dev that commit them to svn/git.  Offline translators work with plain text files (suffix po) that are compiled into mo-files, with error checking as part of the compilation.

I believe that with some prudent perl scripting, existing *.tab-files could be reassembled to *.po-files, minimizing the loss of existing translations. However, these reassembled strings should probably be marked as fuzzy...

Sorry for being so long-winded... if you are still with me... then: This would of course require quite a substantial effort from the far too few coders we have, as well as an effort from the translators. So my point with this post is to start a discussion if this is a desired change as well as a search for coders possibly interested in implementing the gettext system for Simutrans.

vilvoh

I see your point, I love the idea and I support it, but the question is if after moving on to this method, which is the most reasonable, we could still use Simutranslator as the online translating interface, which is simple and easy to use, for those files. However, this suggestion involves several different aspects: programming, translators and translation managers.

My doubt is that Simutrans is translated to many languages and the amount to translatable texts is really huge, so we really need some kind of online tool to manage translations, better than a SVN or a repository where people must have to upload every new version of the translations, which in my opinion is quite annoying.


Escala Real...a blog about Simutrans in Spanish...

prissi

Since this means that most messages needs to be rewritten to be more translation friendly, I feel like this is a very larger task nobody will do.

Replacing translator::translate() by gettext() is trivial. However, for multiplayer games there will be than a UI translation and a game name translation (convoi names etc.) to have matching names on the map.

I am not so sure to destroy a working (of course improvable) infrastructure by something, which will start over at zero again. Furthermore, for the paks, their translation would need to go to system specific directories, if I see correctly.

VS

Support, although... this is a complicated issue...

The most valuable part of STL's interface, in my opinion, is graphical representation of pakset items. I would like that to stay in some form. For program texts, though, gettext is like water to fish... absolutely yes. Those who have tasted the forbidden fruit of string literals will never look back at horrors of identifiers :P

prissi: why specific folders?
http://php.lsu.edu/function.bindtextdomain.html
that's the php version which was the most friendly I found online, but if you look at gettext sources it has this, too...

I don't know Simutranslator's internals, but it does mapping string:string, so in principle it should output gettext catalogs as easily as our text files. Character encoding is currently handled at runtime, so no problems there, either.

My projects... Tools for messing with Simutrans graphics. Graphic archive - templates and some other stuff for painters. Development logs for most recent information on what is going on. And of course pak128!

vilvoh

#4
There're several web interfaces for translating .po (gettext) files:

Escala Real...a blog about Simutrans in Spanish...

jonasbb

I think it would be great if ST would use gettext.

Quote from: prissi on June 16, 2010, 11:51:15 AMReplacing translator::translate() by gettext() is trivial.
If it is so simple to create a version that uses gettext, why not create a test edition.
And an export in .po files in Simutranslator could I write. This should not be the problem.

The change to use complete little english texts instead of using these keys like LOCO_INFO could be must not be immediately. They could be changed if rewriting code or rechecking code. In new code the new system should be used.

Václav

SimuTranslator may not be perfect but still it is better than gettext, I think.

Problem is in gettext's files - it uses two sets of files - po files and mo files. So translations will consumpt about two times more disk space than now.

po files - original files with translations - editable (I found these editors: poEdit, gtranslator and KBabel)
mo files - binary file - not editable; this is file really used by gettext

And by the way, I have only a little experience with gettext but it is Linux affair. Including gettext into Simutrans will cause many problems under Windows and MacOS (BeOS and so on), I think.

... it is all for this time ...

Chybami se člověk učí - ale někteří lidé jsou nepoučitelní

jonasbb

gettext is also available under Windows and UNIX like OS (also Mac, Linux, ...)
Quote from: VaclavMacurek on June 16, 2010, 02:06:45 PM
Problem is in gettext's files - it uses two sets of files - po files and mo files. So translations will consumpt about two times more disk space than now.
This is not true. Programs use only .mo files. So .po files do not have to be included into ST binary backages.

Only ST source will be affected by this.

sanna

Quote from: vilvoh on June 16, 2010, 10:50:22 AM
I see your point, I love the idea and I support it, but the question is if after moving on to this method, which is the most reasonable, we could still use Simutranslator as the online translating interface, which is simple and easy to use, for those files. However, this suggestion involves several different aspects: programming, translators and translation managers.

My doubt is that Simutrans is translated to many languages and the amount to translatable texts is really huge, so we really need some kind of online tool to manage translations, better than a SVN or a repository where people must have to upload every new version of the translations, which in my opinion is quite annoying.
SimuTranslator in its current form could probably not be used, but as I stated, there are already today online tools that you can use to translate with. There you have possibility of fuzzy marking, searching through a translation database for similar strings, anonymous translation (if so desired). I do agree that offline only po-file editing might not be sufficient for Simutrans needs. I mentioned launchpad, there is also Rosetta, and iirc there are also tools to run your own gettext translation site. Of course, somebody has to move the translated strings to the distributions, but that is already the case with SimuTranslator.

Quote from: VaclavMacurek on June 16, 2010, 02:06:45 PM
Problem is in gettext's files - it uses two sets of files - po files and mo files. So translations will consumpt about two times more disk space than now.

po files - original files with translations - editable (I found these editors: poEdit, gtranslator and KBabel)
mo files - binary file - not editable; this is file really used by gettext

Translators work with po-files, mo-files are compiled versions of these po-files that are distributed with the game. The po-files need to be stored in svn, but the mo-files are compiled as part of the distribution process; the po-files are not included with the distributions. So not really a doubling of space.

Quote from: VaclavMacurek on June 16, 2010, 02:06:45 PM
And by the way, I have only a little experience with gettext but it is Linux affair. Including gettext into Simutrans will cause many problems under Windows and MacOS (BeOS and so on), I think.
gettext is today successfully used by many cross-plattform applications, without hazzle for either the users of the application nor for the translators.

Václav

Quote from: jonasbb on June 16, 2010, 02:40:23 PM
gettext is also available under Windows and UNIX like OS (also Mac, Linux, ...)
I know it. I only have bad experience with installation of it.

Quote
Programs use only .mo files. So .po files do not have to be included into ST binary backages.
Please don't be angry with me for following words:
and would you tell me how would you add translations of add-ons?
Because I don't think that I am not alone who use translations of downloaded add-ons.

So I think that gettext would be used on translations of base texts - but on translations of objects is currently used way better.

Chybami se člověk učí - ale někteří lidé jsou nepoučitelní

sanna

Quote from: VaclavMacurek on June 16, 2010, 04:24:57 PM
and would you tell me how would you add translations of add-ons?
Because I don't think that I am not alone who use translations of downloaded add-ons.
Well, add-ons could use their own namespace (or textdomain to talk in gettext lingo). Compiling mo-files are not hard, and they could be distributed with the addons.

A counter-question: How are translations of add-ons managed today? There are not many at SimuTranslator afaict. Does the add-ons come with their own tab-files? Then coming with their own mo-files is not that different....

EDIT: Oh, and there is absolutely no need to fear anger.. we are trying to find the best way for Simutrans *smile*

Václav

Quote from: sanna on June 16, 2010, 05:19:47 PM
A counter-question: How are translations of add-ons managed today? There are not many at SimuTranslator afaict. Does the add-ons come with their own tab-files? Then coming with their own mo-files is not that different....

There are two ways (related to my native language - and pak for what vehicles and else objects are prepared):
1. adding text into cz.tab in pak128/text dir
2. creation file of (for example) name cz_vehicle.vagon.bmee.tab and its copying into pak128/text dir

Tab file of way #2 contain needed translations

CSD_Amee
CSD Amee (1. trida)
CSD_Bmee
CSD Bmee (2. trida)
CD_Bmee
CD Bmee (2. trida)
CD_Bmee_balkan
CD Bmee balkan (2. trida)
ZSR_Bmee
ZSR Bmee (2. trida)
ZSSK_Bmee
ZSSK Bmee (2. trida)
ZSSK_Bmeer_Blonski
ZSSK Bmeer Blonski (2. trida)


PS: You can download not only these passenger waggons from czech language board and test it.

Chybami se člověk učí - ale někteří lidé jsou nepoučitelní

prissi

Simutranslator for pak files is very convienient: You can just upload the dat files and it will extract strings and images without any further work. tab files are easily changable by anyone without command line tools

gettext ist not available for BeOS, only Haiku. (But that might be ok.) But as far as I looked, all mo files must be from a single directory, while translations are currently from up to four locations. (But there are also tools to go from mo to po and back.)

You can improve many thing. YOU can. I can barely find time to correct errors and do some needed code clean up in the UI. Being this a private effort, coverting things to throw out working code, which is very nicely tailored for our use is not my top priority. Even more gettext is another extra libary to install and maintain if you are not on Linux. Actually, you also need to supply libiconv which is another 1MB dll not needed currently.

[well, and one the statement of using gettext, I was also speculating, that nowadays one would have used JAVA ... ]

Frank

Quote from: sanna on June 16, 2010, 10:26:52 AM
...
This system which is used by a great many open source projects works on a different principle. Instead of using specific keys; one language is chosen (this is usually English) as the base language.
....

Export untranslated objects in english is not a problem. In TileCutter translation of this is used.

Changing the search is likely to present no major problem.

The use of a base language presupposes that all objects in all sets have a translation in the base language.

The SimuTranslator had an import file feature, only it was always reflect problems with the automatic detection of the correct character encoding, so this feature is disabled.



And as for the translation of addons, these first have to be set in the SimuTranslator.

An export selection is the next step. Also a link to addons.simutrans.com is conceivable. For example, could SimuTranslator the selected addon text files in the pak addons.simutrans.com pick and pack into a zip file together.

And that is the key advantage we have with SimuTranslator. We can tell it to extend functionality for our needs. In foreign programs are likely to mean their own adaptations always increased maintenance requirements, as must be done with each update, the adjustments again.

prissi

Entering different plural forms in simutranslator as well as advanced printf functionality is neither present in simutranslator nor in standard clib ...

Václav

Excuse me please following: I think that Simutrans needs improving (mostly) base texts for easier* translations instead transferring translations into new way.

* some texts contain many links to other texts (included from special sources inside game) and it makes translation quite difficult --> so I would like to ask for improving texts toward simplicity

Chybami se člověk učí - ale někteří lidé jsou nepoučitelní

Dwachs

Why not? Improvement is always welcome :)

Do you have some specific examples in mind?
Parsley, sage, rosemary, and maggikraut.

Václav


Chybami se člověk učí - ale někteří lidé jsou nepoučitelní

Dwachs

... and what exactly should be improved there?
Parsley, sage, rosemary, and maggikraut.

Václav

Hard to say which (or what) when program name is for example 8extern - and in translation is %s, %s Pole%s.

Problem may be on my side - because here I did not find meanings for formatting metasigns (%s, %i, %d and so on...) and what is happened when it is located more than once.

Minor problem is in sign for m3. I tried to copy that sign from elsewhere and write it with ASCII code (179 or 0179) and result is the same - │ or ł. So I had to use translation based on coversion between basic units of physics and common units of physics (000 litres / litrů instead m3).

This may be idea for GUI overhaul - game need else fonts - for reason written above and also for some letters should have different size in lowercase and uppercase form (ž and Ž and similar).

Chybami se člověk učí - ale někteří lidé jsou nepoučitelní

Dwachs

Quote from: VaclavMacurek on July 14, 2010, 11:12:50 AM
Hard to say which (or what) when program name is for example 8extern - and in translation is %s, %s Pole%s.
maybe documentation is missing here. The meaning of the %x formatting symbols depend on the context. Please refer to the English/German translation and guess, where the string is used (or ask on the forum). The %x symbols are placeholders for other content: Usually some other text (for %s) or numbers for %d,%i is inserted. Text could be: city, industry, good names etc.

As to the station names: The strings to be translated look like

[0..9A..Z](extern|suburb|center)

ie strings to translate are 0extern .. 9extern, Aextern..Zextern etc.

These are used for generating town halt names:

  • *center .. for stations within city limits (+-2)
  • *suburb .. for stations near cities
  • *extern .. for station outside of the city

The translation can have two or three occurences of %s:

0suburb
%s school %s %s
1suburb
%s garden rose %s

Upon creating a new halt name the %s are replaced as:

  • the first %s becomes the name of the city
  • if there are three %s's, then the second one becomes the direction from the city center
  • the last %s becomes the halt type, ie translation of "H", "BF", "Dock", "Airport"
The above mentioned strings then would become for example

Praha school north Airport
Berlin garden rose station


Please feel free to put this piece of documentation on the wiki or elsewhere.
Parsley, sage, rosemary, and maggikraut.

Václav

Thanks. And now I have one minor idea what can make translations (mainly of stations) better.

Please, set (for example) ost1 and ost2 instead only ost - ost1 can mean east and ost2 can mean eastern. I know that some languages can have only one word for this but it could help very much.

Is it possible?

Chybami se člověk učí - ale někteří lidé jsou nepoučitelní

Dwachs

Quote from: VaclavMacurek on July 18, 2010, 12:57:47 PM
Thanks. And now I have one minor idea what can make translations (mainly of stations) better.

Please, set (for example) ost1 and ost2 instead only ost - ost1 can mean east and ost2 can mean eastern. I know that some languages can have only one word for this but it could help very much.
Did I understood your right: your proposal is to add different possibilities to translate 'ost' just like the 1extern, 2extern etc?
Parsley, sage, rosemary, and maggikraut.

Václav


Chybami se člověk učí - ale někteří lidé jsou nepoučitelní