News:

Simutrans Chat Room
Where cool people of Simutrans can meet up.

German in the code

Started by felo, March 10, 2010, 05:48:22 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

jamespetts

Of course, if the Standard developers wanted to translate the code, and be able to use parts from Iron Byte, they could just adopt Hajo's translations ;-)
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Spike

Quote from: Memzeron on February 10, 2012, 10:45:40 AM
At the moment my code compiles quicker, that saves each developer some minutes for each full build already.

Before Dwachs calls me a liar again (and last time I could prove that I told the truth), I want to be more precise: Compile times came down with make/gcc, what I'm using. It is well possible that MSVC is different, and I can't check that. So I can only say "compiles faster with gcc".

Spike

#72
Quote from: jamespetts on February 10, 2012, 10:53:24 AM
Of course, if the Standard developers wanted to translate the code, and be able to use parts from Iron Byte, they could just adopt Hajo's translations ;)

I've tried to intermingle translations and othe code changes, so the usual SVN diff tools will not allow them to get clean patches from my commits. Being a developer since many years I know pretty well what makes the life of a developer difficult. They would have to take all, and I know that Prissi doesn't want my UI layout changes, and they won't get the translations without the layout changes. I'm pissed and I told them that I'll not play nice anymore after they have offended me. But I'll now wait a while and see what they do, and then reconsider my plans.

Edit:

Bleh. What sort of discussion is this? Dwachs isn't online anymore, that Omikron guy logged off after throwing that message at me, and Prissi is nowhere to be seen. Now I can again wait for hours to get a reply of the important people. I hate that sort of waiting in uncertainty. It kills me.

Dwachs

Why pushing others to answer your posts? There is no need to add anything here. Thank you for your clear words that revealed you intentions.

QuoteI shall make patches and wait days for them to be include?
I still have to see your second patch. The first one was included 20 minutes after you published it.
Parsley, sage, rosemary, and maggikraut.

Ashley

#74
I am locking this topic, it serves no productive purpose.


Please, everyone, sort out these private issues in private. There is no need to rip the community apart over this.


Edit: It seems that this discussion may be useful afterall. Please continue.
Use Firefox? Interested in IPv6? Try SixOrNot the IPv6 status indicator for Firefox.
Why not try playing Simutrans online? See the Game Servers board for details.

whoami

#75
I have written a Perl script to run an automatic translation of the C/C++ source code identifiers, using a simple list file for old->new identifiers. This works quite well for the actual source, checking for any conflicts beforehand, which is possible because the whole relevant code is accessible. With some changes to the translation table, it would be usable by the forked projects, too. I have not looked at Hajo's translations, but they could be integrated as well.

The problem is not so much with the code, but with the comments:

They are not treated specially yet, but I intend to change this. The current state leads to the ugly effect that single words in German comments are translated (rather than feeding whole paragraphs to some Babelfish successor  8) ). However, I cannot simply omit stuff in comments, because they often refer to identifiers, some even contain commented-out code lines, and they are used by diff. This will not be solvable in an optimal way. (If there are nested /**/ comments, they make handling even more difficult - I would rather remove nesting by separate patches.)

I already try to support .diff files, but their context part does not tell reliably what line is a comment, so they depend on translated comments, thus the easiest way would be to translate comments in all cases, but disregard them for conflict checks. Otherwise, I would have to emulate patch as well and compare to the original file, which I probably will not try at all.

Of course, my translation list is far from complete. Currently, 31,571 string replacements will be suggested (including comments). The current impediment is the set of false identifier conflicts caused by comments and, to a smaller extent, code already using English identifiers that are the same as natural translations of the German ones.

Not to forget: I have not seen a missing text translation (caused by my changes) in the game yet, but if that became a problem, separate handling of strings would be mandated.

(edited to improve transfer of information to the reader)

Combuijs

Bob Marley: No woman, no cry

Programmer: No user, no bugs



sdog

Quote from: whoami on May 30, 2012, 06:01:45 PM
The problem not so much with the code, but with the comments:

They are not treated specially yet, but I intend to change this. The current state leads to the ugly effect that single words in German comments are translated (rather than feeding whole paragraphs to some Babelfish successor  8) ). However, I cannot simply omit stuff in comments, because they often refer to identifiers, some even contain commented-out code lines, and they are used by diff. This will not be solvable in an optimal way. (If there are nested /**/ comments, they make handling even more difficult - I would rather remove nesting by separate patches.)
You could add a line below comments where you put the translated and untranslated strings in a fixed format. The comments likely have to be translated by hand and patched by their own. This note will give translators a way to know that variables have changed in the meantime.


"[...] code already using English identifiers that are the same as natural translations of the German ones."
I'm afraid, I dont' understand this.

whoami

Quote from: sdog on May 30, 2012, 06:17:03 PM
You could add a line below comments where you put the translated and untranslated strings in a fixed format. The comments likely have to be translated by hand and patched by their own. This note will give translators a way to know that variables have changed in the meantime.
It will be possible to recognize whole comment blocks and handle them as an entity, but not a random mixture of commented-out code and short remarks.

Quote"[...] code already using English identifiers that are the same as natural translations of the German ones."
I'm afraid, I dont' understand this.
Example: "weg" is used, but also "way", same for bild/picture, get_typ/get_type, groesse/size and many more. The laziest^Weasiest solution would be to add a suffix to the translated ones.

The number of replacements that I mentioned is without touching "ribi" (direction bits) and "sp" (player), by the way.

eipi

I already started to translate the code comments into English since a while ago I ran Doxygen on the source code and realized the comments lacked a bit of consistency :)

The patch is not complete yet, but I am still working on it when I have time.
http://dl.dropbox.com/u/53679800/Simutrans/patches/SimuTranslation_bauer.patch

prissi

This looks like simple text search an replace. Such a script would need to parse at least comments/strings/defines. Or some unhappy surprises will come up.

As important as such a tool would be consent on how to actually rename stuff. Because this procedure should not repeated too many times ...

Adding a suffix will not change anything but rather cause more confusion. get_way or get_way_en for instance ?!?

whoami

@Eipi: since you work on the comments, your changes and mine are the ideal complement to each other, yours to be applied first. Do you keep this in sync with the ongoing development?

Dwachs

I would not care that much about German comments. As they are certainly some years old. If the automatic translation 'only' creates a mixture of German / English words in a comment then this is tolerable, imho.

@eipi: I think it would be enough to first start to translate and comment the functions etc in Doxygen style.
Parsley, sage, rosemary, and maggikraut.

whoami

#83
@Prissi: My script is only built around s/\b$o\b/$trl{$o}/g (replace exact word match, but handle e.g. #include differently), but as I check for conflicts between all sets of strings involved, no conflict should arise. The conflicts check is global and therefore complains if there is no real conflict because the identifiers would be in nonoverlapping scopes. So there are false alerts and unnecessary work, but no conflict is supposed to show in the result.
EDIT: another way: first rename the few conflicting identifiers.

Quote from: prissi on May 30, 2012, 07:06:58 PM
Adding a suffix will not change anything but rather cause more confusion. get_way or get_way_en for instance ?!?
Something like that, yes, or use a different word (less preferable).

Dwachs

Quote from: whoami on May 30, 2012, 06:01:45 PM
checking for any conflicts beforehand, which is possible because the whole relevant code is accessible.
What do you mean by conflicts here ?

And why is use of 'weg' and 'way' difficult to handle ? You do not need to translate way after all, it should not be matched by the translation table. Adding suffixes sounds like a bad idea to solve this.
Parsley, sage, rosemary, and maggikraut.

whoami

Well, I do not even pretend that I parse C++, and therefore I have no information what is a constant, class, variable, member function, enum item etc. I only have a bidirectional mapping of names, so I have to change all occurrences of one old name in all files to the new name, which must not exist before this change. Thereby, syntax and semantics of the code stay the same. In other languages, you can do funny things with eval, creating code on the fly, and I guess that it is possible to access and manipulate the symbol tables in C/C++ too, but ST does not try that, right?

VS

Umm, the way I understand that is that you should be able to tell if a match is an identifier, or part of a comment or string.

My projects... Tools for messing with Simutrans graphics. Graphic archive - templates and some other stuff for painters. Development logs for most recent information on what is going on. And of course pak128!

whoami

It is easily possible (though not implemented yet) to handle comments and strings in a different way, but there still are some identifiers with the same name as the best (natural) translation for some German identifiers, and those can be real, harmful conflicts. Without the complete conflicts check, I had a changed version that compiled completely, but crashed on starting, due to variables popping up with the same name as the ones that are actually meant.

Ters

If German is difficult to understand, then German translated to English by a computer program will be beyond any hope of understanding. Especially with the abbreviations and special terms used in the code.

And as written above, some of the comments are outdated. I noticed a day or two ago that the one for way_obj_besch_t wasn't correct.

whoami

My plan is to translate only identifiers, and all of them at the same time in a consistent manner (although it is possible the split it into phases, e.g. one for each area or class). I will have to see how far I get with comment handling.

jk271

I suggest to translate "spieler_t" to "company_t", not "player_t". I have read here in forum some time ago about intention to enable more players (persons) to control one transport company.
Today players can do it too, but they have to share password.

Such a translation change would avoid confusion in future: Implementation of unique password for each players would probably need keyword "player" as a class name.

Ashley

Quote from: jk271 on May 30, 2012, 08:36:22 PM
I suggest to translate "spieler_t" to "company_t", not "player_t". I have read here in forum some time ago about intention to enable more players (persons) to control one transport company.
Today players can do it too, but they have to share password.

Such a translation change would avoid confusion in future: Implementation of unique password for each players would probably need keyword "player" as a class name.

Agreed, we need to be clear on this distinction in future.
Use Firefox? Interested in IPv6? Try SixOrNot the IPv6 status indicator for Firefox.
Why not try playing Simutrans online? See the Game Servers board for details.

prissi

We had also clients in the game ... but player company seems a better description.

whoami

#93
Small update regarding my script (I did not have much time to work on it):
I am now able to handle comments and strings to some extent, so strings can be excluded from translation, although it would be most useful to also translate all the debug messages and XML tags (but that needs some code change to allow both the old and the new class names to be supported in settings.xml and XML savegames). Also, I would rather convert the GUI output strings and the translation files (and reimport them into Simutranslator) instead of still keeping German in them. (And I am able to fix spelling errors, too. :) )

I introduced a separate preparation phase, where I first change some existing English names (~4000 replacements, usually adding "_" to the name) to make room for the real stuff (~50,000 changes from a mapping table with ~800 entries, still increasing).

It seems that translating all the comments will be the best way, because not many are changed at all, and the changes concern mostly identifiers. Oh, and the filenames and folder names could be translated automatically, too (of course changing the #include lines).

I attach the changed simworld.h as a small example (even this is not completely done).
EDIT: I just noticed an error in this file, but this alone will not compile anyway.

sdog

QuoteOh, and the filenames and folder names could be translated automatically, too (of course changing the #include lines).

How would SVN cope with that, and are the german names of files also obstacles to understanding the program, enough to warrant the reduction in continuity?

whoami

My SVN client has a rename function, but it might only be there to ease the process of "delete under old name, create under new name". Also, I often get SVN hickups here when I use that function.

Ters

Last I checked, SVN's rename functionality is essentially based on deleting the file, and creating a new file as a fork of the deleted file. That's a way of doing it that is not unique to SVN. I haven't tried it with SVN, but merging changes to a moved file from a branch that hasn't moved might not work automatically.

Dwachs

Imho, first the translation part should be sorted out. Moving & Renaming files is a rather trivial part compared to this.
Parsley, sage, rosemary, and maggikraut.

VS

Technical note: The "renamed" files can have a property telling where they came from. I know that Tortoise can use this information when browsing logs, no idea about other situations... Anyway, it would be good not to miss this :)

My projects... Tools for messing with Simutrans graphics. Graphic archive - templates and some other stuff for painters. Development logs for most recent information on what is going on. And of course pak128!

whoami

Quote from: Dwachs on June 05, 2012, 06:41:27 AM
Moving & Renaming files is a rather trivial part compared to this.
But I think that this needs to be automated, too, if one wants to keep .diff files in a working state. (I do not know how many patches are maintained separately from the trunk.) And let's not forget about ST-Experimental, ST-3D and Iron Bite (I didn't talk to anyone except from this thread).

Markohs

st-3d won't be much of a problem I guess, just merging the repository with trunk should work, I've doine it various times already and it kept track of added/deleted files, and all the changes were fairly easy to merge, with almost no manual work.

isidoro

@whoami:  your work is wonderful!  The file, to my eyes, is much more clear.  I don't know if changing the name of the files is worth the effort, though.  You are always one click away of looking inside...  The comments would be much better (better still if an expert in the code can update them and give them uniformity or a standard format)


Dwachs

@isidoro: The translated header file looks as messy as the untranslated one :P But simworld.h is really bloated with functionality, which could be splitted into smaller chunks (management of global lists, all the event-related stuff, the main game loop, all these terraforming and world-generation functions ...)

That said, I am looking forward to use the translation script ... and debug the outcome :)
Quote
better still if an expert in the code can update them and give them uniformity or a standard format
I took me two evenings to do this for dataobj/umgebung.h, up-to-date comments would be nice but are time-consuming to generate... Better do this on the fly.
Parsley, sage, rosemary, and maggikraut.

whoami

#103
I guess it would be useful to share the current state of the little tool and the tables, no matter how messy and incomplete the whole thing still is (the translation table started as a quick setup to be able to evaluate the whole approach, therefore needs to be restructured). WARNING: RUN THIS ONLY ON A COPY OF THE FILES! They are expected in .\trunk-copy (I advise to use "clean"ed source code, the IDE, if any, must not keep the files open). The script needs Perl >=5.8, for Windows e.g. the free "Community Edition" from ActiveState: http://www.activestate.com/activeperl -

The file tr.cmd is a frontend to save typing on Windows, the actual file translate.pl should run on any platform (EDIT: but the file names will be shown with \'s). You might need to adapt the paths in it and then call:
tr phase1 check
tr phase1 sim
tr phase1 translate
tr phase2 check
tr phase2 sim
tr phase2 translate
The sim and check calls are optional and do not change the files. But if "check" shows warnings, sim/translate will fail. EDIT: A lot of output is printed on the screen - you better minimize the cmd.exe window or switch off debug output in order to achieve acceptable performance. If additional translations have been entered in the text tables, a delta update can be run with the same commands, but a check is not possible after translation, and phase 1 must not run after phase 2 changes have happened.

I do not have access to a (commercial) refactoring tool, which could do the whole job or at least help a lot. For parsing C++ (in Perl), I did not find much (as opposed to plain C), but if I use GCC's dump function, it would still be much easier (and probably faster at runtime) than writing a C++ parser and dependency tracker on my own.
But this kind of understanding of the code would only save the unnecessary phase1 changes (due to false conflicts) and allow for better texts in some cases.

EDIT: the current version translates single words in comments, but not in strings of any kind.

Dwachs

What is the point in renaming get_count to get_count_ first?

You should also adjust the usage-comment, the command is check not checkall.

Now I do not know how to proceed. Translating all the code at once seems to be not sensible. I do not want to end up with names with trailing underscores.
Parsley, sage, rosemary, and maggikraut.