News:

Want to praise Simutrans?
Your feedback is important for us ;D.

German in the code

Started by felo, March 10, 2010, 05:48:22 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

felo

Excuse my English

[EN]
It would be recommended that the comments, functions and variable names were in English. The German language keeps many people out of the code, I went mad because of dings, Ribi, etc.
A search and replace would help much, at least with the names of functions and variables.

[ES]
Sería recomendable que los comentarios, nombres de funciones y variables estuviesen en ingles. El alemán mantiene a muchos fuera del código, ya me tiene medio loco los dings, ribi, etc.
Una búsqueda y reemplazo ayudaría mucho, por lo menos en los nombres de funciones y variables.

IgorEliezer

Quote from: felo on March 10, 2010, 05:48:22 PM
Excuse my English

No need for excuses, your post is understandable. I did some edits so people won't have doubts about what you meant. ;)

prissi

ribi is not german either. It is just an artificial word form (Richtungs bits, aka english dibi from direction bits [so no improvement in english]). But you know, renaming everything would make nearly all patches useless ...

Whenever there is some renovation, we renamed some variables. But since the team size is that small, any additional work is not warmly embrassed. Also many of the active developers were german, the only notable contributions from the non-german region were from kierongreen, Timothy, Knightly and z9999. It has also not stopped jamespett to branch into the experimental branch. Incidentally, this is not too different from TTDpatch, thus aparently german programmers have a liking to transportation games ...

You are welcome to do a patch to translate everything. Or add documentation. I do it, whenever we touch something old. But many parts are stable and thus I do not touch them.

Imho the biggest problem with a big projekt is, simply it is big. Even though simutrans is quite strongly modulized due to heavy OOP usage, most thing that can be achieved by a simple change have been already done. And for advanced stuff you need to dig into the code for a month or so to get an idea, imho. Perhaps ask Knightly or Dwachs about it, since those joined the team more recently.

jamespetts

I found Google Translator very helpful in deciphering German variable names. I understand the problems with translating the names, but perhaps there should be at least a concerted effort to put English translations into the code comments? That's what I do whenever I translate something for Experimental.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Amelek

amount of work required to translate code from German to English seem unimaginable. For instance, I attach patch that fixes "welt" and "Welt" to "world" and "World".

http://mallorn.ii.uj.edu.pl/~amelek/WeltToWorld.zip

877 kb unpacked patch file...

and obviously, some comments got messed up

wlindley

Maybe a "Rosetta Stone" or "abbreviations/translation table" file, instead of trying to rewrite everything.

felo

[EN]
It not seems so complex.
In the Visual Studio you can use the tool 'Replace in files', choose these options appropriate and replace all occurrences of a word in all files of the solution.

On the prevalence of German programmers think is partially due to this problem. Due to the size of the project is very difficult to understand the code and many give up.

Much of the time I lose in translating and find where things are, no doubt that ultimately succeeds in doing something but by that time I have learned a few words in German.

In addition there are almost no comments. If you could put a comment at the beginning of each file explaining that thing would help a lot.

[ES]
No me parece tan complejo.
En el Visual studio se puede usar la herramienta 'Replace in files', seleccionar las opciones adecuadas y reemplazar de un golpe todas las ocurrencias de una palabra en todos los ficheros de la solución.

Sobre la prevalencia de los programadores alemanes pienso se deba en parte a este problema. Por el tamaño del proyecto se hace muy difícil entender el código y muchos desisten.

Gran parte del tiempo lo pierdo en traducir y encontrar donde están las cosas, no hay dudas que al final se logra hacer algo pero para ese momento ya he aprendido unas cuantas palabras en alemán.

Además casi no existen los comentarios. Si se pudiera poner un comentario al principio de cada fichero explicando que cosa es, ayudaría mucho.

prissi

Simutrans is not that badly commented. Try out OpenTTD for a change ... (maybe the had improved meanwhile, but the last time I checked it was rather pages without comments at all.)

Commenting each file what it does would be nice, but I still have to see a real world project of similar size where this had been done and those header are still correct. Simutrans stems from a hobby project of Hajo (german) who tried himself to self study C++ by writing a program.

And you are right, there are tools that could do that. And all of them usually fail since simple searching and replacing does not always do the trick or a file is omitted. And it breaks all patches, which makes a lot of additional work.

Moreover, the function of an object is not dependent on the naming. If you have an idea, what dingliste_t does, I doubt you would have a different idea what thinglist_t or mononolisto_t would do. I had even a mathematic teacher who deliberately chose wierd names for variables to avoid to connect wrong ideas to the meanings for them.

If you consider to work on the code, just ask. This is usually 99% more helpful than searching through the code, because we can give answers like: "Drawing is done in simview.cc, which calls display_xxx from grund.cc and simplan.cc" Even with extremly well comments, figuring this out would have taken you longer than just asking.

felo

#8
[EN]
Well. I'll Ask.

About the patches, I don't know how they work.

[ES]
Bien. Preguntaré.

Sobre los parches, no se como funcionan.

kierongreen

I've certainly found language is a fairly minor obstacle when it comes to contributing to the code. For someone new to the code even knowing where abouts a particular change might have to be made could take several hours to work out due to the shear amount of code there, in comparison working out what a comment means (I only have rudimentary German, so might rely on google translate for some things) really doesn't take too long. The common variable and function names you pick up very, very quickly!

Spike

I had a slow migration to English names and comments in mind when the project started to be more than my personal toy. But as others pointed out, that was and still is quite difficult in some areas.

But I think we should still try to get there ... all new code should be English and all new comments, too. Samewise, changed code should be changed to English where possible, and changed comments, too. This way the code should become more international step by step.

I'm sorry for all the German in the code. It's been my only project where I used German for identifiers and comments, and just this one became popular enough that others wanted to participate ... bad luck :( In the beginning this was really just a personal toy project of mine.

Having said that, the one who wants to try a full translation of code and comments has my full support!


neroden

Quote from: Tubehead on March 11, 2010, 09:08:30 AM
I had a slow migration to English names and comments in mind when the project started to be more than my personal toy. But as others pointed out, that was and still is quite difficult in some areas.

But I think we should still try to get there ... all new code should be English and all new comments, too. Samewise, changed code should be changed to English where possible, and changed comments, too. This way the code should become more international step by step.

As a comment on priorities, I think full-length German words (Welt, Ding, Fabrik, Stahl, Weg, keine, etc.) aren't that much of a problem as they can be looked up in a dictionary or using Google Translate.

German *abbreviations* are confusing ("wkz", "ribi", "koord"); mixed language is confusing; ("has_diagonal_bild" -- should be has_diagonal_picture or hat_bild_diagonale, surely?) and in general abbreviations are very confusing ("get_wtyp")?  If anyone plans to do piecemeal change I suggest focusing on these.

prissi

Well get_ was gib_ before. Thus this is the result of automatic changing of the code as requested earlier.

Spike

Someone, I think Neroden, undertook a change to split gui_koord from koord. Maybe this new type can be named gui_coord, to be closer to English in the spelling?

Also, koord could be renamed to coord, (or point or point_t  maybe?) after the split.

Nothing serious, just suggestions to move to a more English codebase if things are in works anyways.

wkz_ could become tool_.

TurfIt

Splitting the German discussion back out of the oneway roads implementation thread...
Quote from: isidoro on January 07, 2012, 12:31:36 PM
Quote from: TurfIt on January 07, 2012, 02:59:50 AM
If you mean the German comments, sure. If you mean renaming variables, to what end? IMO they're fine as is.
I still haven't sorted back out the half-assed renaming of einstellungen to settings... Now instead of just einstellungen and umgebung to keep straight, we get a third one (settings) to remember.
But I surely have more chances to understand a Chinese person walking down the street than einstellungen or umgebund.  Let's be serious...  If a certain translation was not completely accomplished, it doesn't mean that translation is not needed and have not been asked for several times from different people.

If we want to favor a truly open development project, German is a serious issue.  Another thing is if we want to keep it for a more reduced set of developers.  But that is not a good strategy, imho.  You shouldn't be afraid of opening it.  What goes or not to the trunk is still a decision of head developers.  And if somebody comes and doesn't agree, all he has to do is a fork.  And time will tell who were right.
Reply #9 above by kierongreen I think hits the nail on the head. The German in the code is insignificant compared to learning the code itself. If ones language skills are so bad that it's an obstacle, then I wonder if that translates to their programming language skills too, and hence it's probably best if they didn't try contributing...

Also, I'm actually finding the German identifiers to be helpful! I can do a translation (I highly recommend dict.cc for this purpose), and pick one of the many synonyms that *I* think fits the usage best. Yes, sometimes I pick wrong and have to go back and change my pick, but still easy. If instead, everything was in English, them I'm stuck using the word that the original programmer chose. Perhaps that word is not the best fit for the usage; With the prevalence of non-native English speakers contributing, the odds of the 'wrong' English word being chosen are much higher.

Spike

I'm used to Java development environments, and they make renaming variables quite easy. Place cursor on variable name, hit rename hotkey, enter new name. The IDE will replace all occurrences and usages of the variable with the new name. And only the variable, no other text containing the string.

I would expect that C++ IDEs can do that too?

prissi

It depends, as there are not all files in a project. And the makeobj does some dirty tricks which C++ IDE may thing certain files were not used, as they are only bound together during link time.

jamespetts

Quote from: prissi on January 08, 2012, 09:20:07 PM
It depends, as there are not all files in a project. And the makeobj does some dirty tricks which C++ IDE may thing certain files were not used, as they are only bound together during link time.

Can't one just (1) include those files into the project where possible (as "external resources" if they should not be compiled); or (2) make a list of all the variable names in these external files if that is not possible, and replace them with a straight search-and-replace on those documents, or manually, if necessary, using the standard method for anything not in those files?
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

TurfIt

Quote from: Hajo on January 08, 2012, 09:03:20 PM
variable name, hit rename hotkey, enter new name. The IDE will replace all occurrences and usages of the variable with the new name. And only the variable, no other text containing the string.
Which is rather the problem.

    const weg_t * weg = gr->get_weg(get_waytype());

needs to end up

    const way_t * way = gr->get_way(get_waytype());

else everything will quickly become incomprehensible.

Also all the abbreviations (and possible inconsistent uses thereof) need to be changed too.
What is 'gr' above. Usually a grund_t. But I've seen bd used as a grund too rather than a boden.
And str is usually for strasse_t but used someplaces as a weg_t. And these are the easy cases to find the abbreviation where the first parts of the english and german words overlap. For the rest?? ?

There's already a complaint above about the results of search/replace... best not to compound things further confusing everybody just become some are too lazy to consult a dictionary. IMHO of course.

isidoro

The difference between the get_weg line and the get_way line is that I understand the latter and not the former, and if I know programming I can try to guess what is done.  Otherwise, I have to look it up in a German dictionary, pray that it is not a typo or an obscure abbreviation, or a declination, or a verbal particle, or..., and try to guess.

And, then, the next line.  And when you translate some more, you forget the first ones.  It is a true pain.  And I can speak from my experience of a not nothern-family language speaker.  And the author of the first post is Spanish, as you can see.

Not to mention comments.  You have to be very, very lucky if you can make sense of what Google Translate spits in those cases.

English, we like it or not, is the de facto lingua franca in Computer Science.  As it used to be in other areas Latin first, French then, and who knows what next.

I may understand that this is not done because of the amount of work, complication, etc., but not done because it is not desired...

missingpiece

The vocabulary of things should be relatively small in any given subject matter -- particularly verbs you do not need many.  Does it actually add that much, learning the name in addition to learning what an object and its methods do ? Also, would you not think that Germany proper names ( just as you learn any other name, like Isidoro ) set things of the program nicely apart from C language key words ?

I am eager to know that. Please explore your experience ! Non-English natives have probably just gotten so used to learning English that learning the name of a thing in a different language only feels different but maybe is not more work ? I am not coding much these days, but I am supervising non-English coders. I would really like to know how it is like.

An example of my experience : I started learning Arabic some time ago. And I found that learning the letters did not add much more complication ontop of nor slowed down particularly learning all the new words and grammar in the first place.

Spike

If no one ever starts to translate the identifiers, the problem will stay forever. I think an intermediate worse state is acceptable if the goal is an improvement once reached.

I'd  also suggest a unification of identifiers, i.e. give variables which contain the same elements the same name. Above there was written that both "gr" and "bd" are frequently used for ground_t pointers, so one should be chosen an used (gr IMO, since that is closer to the English word ground).


Quote from: missingpiece on January 09, 2012, 02:52:08 PM
I am eager to know that. Please explore your experience ! Non-English natives have probably just gotten so used to learning English that learning the name of a thing in a different language only feels different but maybe is not more work ? I am not coding much these days, but I am supervising non-English coders. I would really like to know how it is like.

I once tried to understand a program with Spanish comments and identifiers and it was _very_ difficult to figure out that is does, and how. Thus I suggest a move to English identifiers in Simutrans. If if means something, I never used German identifiers anymore in any of my other published projects, because I saw what troubles they caused to my non-german helpers in Simutrans. I consider it an interesting try, but since one never knows which project will be successful and which won't it's better to use English in _all_ projects.

isidoro

@missingpiece: the answer from my personal experience is yes, no doubt.  You may not understand that since you know German and Portuguese, and supervising has nothing to do with understanding and easily follow and read code efficiently.

I can speak more or less English, Spanish.  I understand 90% French and even speak some, 99% written Italian and 99% written Portuguese (though I can't speak them).  I'm learning Chinese and I love those characters and can read some.  But German=Sanskrit to me, or Japanese, just the same.

And what happen if I'm in ten projects, one in German, two in Japanese, four in Greek, three in Russian?  I'd rather change my job and look for one in the United Nations, don't you think so?

In Computer Science and many other scientific activities, English is the key.  You like it or not.  And I don't mean I specially like it...  ;)


@Hajo: I like your new message "In search of slowness...".  Very nice.


missingpiece

Thanks for the explanation. And, yes, I agree with English being ... the obvious and easy choice for international projects. German used to be the language of science, but these times are gone.

And on a side note : mixing grund and boden definitely confuses a German native coder, too.

prissi

For me, most important is consistency. If get_plume() does not returns a plume(_t) or pointer to it, it is very confusing. In that regard einstellungen_t is a bad example: welt uses get_settings() to return einstellungen_t. That is imho harder to find out (without german) than get_einstellungen() return einstellungen_t. That you can understand without any german.

Thus if you want to change weg_t to way_t you have also tho change get_weg() to get_way() and strasse_t to street_t or else you will add more confusion. And doing this is a goo way to almost break any patch, thus this has to be done quite near a release with almost no pending big changes (like the routing system at the moment).

missingpiece

Quote from: prissi on January 10, 2012, 10:06:41 AMthis has to be done quite near a release

I hardly dare ask if there is a release schedule ? I mean in terms of an "ambitious target date", where some are working towards ? Maybe it is not a date but a set of must-have features which the community decided upon, where all maybe's go it which get ready in time.

If so....you could already announce that the release thereafter would be the translation release. And set the mind of each dev to then work on code-translation -- and only that. The head developer could make such an announcement without being hated for it.

The reward can potentially be three-fold :

  • the discussion now would stop and people focus
  • the sore development time for the translation release could hopefully be kept relatively short -- maybe four weeks
  • the newly achieved comprehensibility of the code may trigger more contribution (given theories in this thread are generally agreed to)

morbidintel

I would like to join the dev team, though the foreign language does set a challenge.
But I believe attitude and enthusiasm is more important, cause I know I'll be willing to figure out the code.

The change doesn't have to be immediate, it could be subtle and done bit by bit(not literally!).
Though I'm new and still studying programming(for games), I'm still willing to help out however I can! (:

Dwachs

I would be more than happy to commit any patches that translate something.
Parsley, sage, rosemary, and maggikraut.

prissi

If you change, do it right: Just avoid things like
strasse_t *str = get_way();

If it is not done this way, it won't be of use.

And it needs to be done AFTER a release, as such patches in the past always broke stuff. If one goes against engineers rule No 1 (Never touch a working system) then do so when it causes the least potential damage.

TurfIt

Quote from: prissi on January 10, 2012, 09:38:04 PM
If one goes against engineers rule No 1 (Never touch a working system) then do so when it causes the least potential damage.
Indeed. Especially when the cost/benefit ratio is so poor IMO.

Doing so around a release is of no help. I have tons of work in progress patches which would totally break after a mass translation, and which will be ready when ready; Not tied to any release. I can't imagine working up enough desire to go back and fix them all once broken by such a change.

But, if we're so hellbent on eliminating the German, do it once. One massive translation patch that kills it all. No point spreading this unnecessary pain out...


missingpiece

The free floating patches in development, which TurfIt brings up, are one thing. What about James' experimental branch ?! How far off is it ? Will they be able to svn-merge the changes out of trunk ? But that will not catch lines where they use a (then renamed) variable in extended code portions in a function.

Should renaming possibly be an automated task ? Maybe something that the precompiler can be used for ( ouch, I'm talking about things I do not know more of than their name.... :-[ ) ?

Spike

Quote from: TurfIt on January 10, 2012, 11:02:25 PM
Indeed. Especially when the cost/benefit ratio is so poor IMO.

IMO having all code in English and with consistent identifiers will help further development quite a bit. From a software engineer's point of view it is a rather big benefit.

jamespetts

In Experimental, I can run a search/replace to catch remaining items, although it would help me if (1) there were not too many things changed all in the same commit; and (2) the commit comments stated clearly exactly what had been changed to exactly what.

In the long term, if it makes it easier for people to contribute, Anglicising the code will be of benefit to Experimental, as it will encourage contributors. There are only so many hours in the day, and many of those I must spend working and sleeping!
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

missingpiece

Quote from: Hajo on January 11, 2012, 09:49:55 PMIMO having all code in English and with consistent identifiers will help further development quite a bit.
Is it an idea to consider that prissi -- or who ever feels being the "marketing" guy -- post a sticky announcement here in the forum, and potentially on the simutrans project page of sourceforge, that after the next release ( 112, I presume ) development work will focus on translating remaining German identifiers for release 113 ? That could probe the communities appreciation.

prissi

Sorry, I am not the right person to address. I am very reluctantly changing something I worked 8 years to keep it working. Even when simutrans was closed source there were contributor. Therefore, *my* experience is that it is a lot of work with confuses all longer term contributor and causes new errors.

I once developed a nicely working patch for OptenTTD. During the development, the submitted cargo-packet (ok, could deal with that) and then changed to code to C++, including stuff in classes and so on. It broke the 3 month work in almost any line. Of course I did not worked on that any longer.

Such action, however well ment, could as well drive people away. It drove me away from OpenTTD contributions. But this is just MY personal opinion. If somebody does the work, ok. But I am not looking forward to integrate such a patch. It is for me work where you can only make new errors with not new features.

Even automatic refromatting the code in a consistent way is something I would like to do. In other projects I even spotted one or the other error by this. But I will not do it, as it will break any patch there (may also remove simutrans as base from experimental.)

That said, when I work on something and add new commetn or specifiers, I do all kind of reformatting, change to english and the like. But that place would have been broken anyways.