News:

The Forum Rules and Guidelines
Our forum has Rules and Guidelines. Please, be kind and read them ;).

Some advices for a young developer who wants to contribute

Started by WillysMD, August 26, 2014, 09:52:43 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

WillysMD

Hi everyone,

First, let me present myself.
I'm a 17-year-old french student who is very enthusiastic about computer science and programming. I learned my first programming language (C) at 12.
I do enjoy also a lot this game and I'd like to contribute to it's development with ideas but also coding.
I have a good coding experience in PHP (using Symfony) and also some using Java.
Besides this, I know the basics of C and Perl and advanced features of Python (using Django) and C++.

So here's my point: I was wondering how I could contribute and especially have a better understanding of the source code and it's mechanics (actually it's quite hard to understand a good million of lines you didn't write).

Also, I got another question.
It's said that "Simutrans is preferably compiled as 32 bit binary!" so I was wondering if it's wise to compile a server as a 64 bit binary?
I'm trying compile Simutrans for client and server and both will be running on 64 bits machines.

Thanks for your help
(and for your understanding of my poor English syntax)

WiLLyS

kierongreen

It doesn't make any difference to code development whether you compile as 64bit or 32bit. The best way to understand the sourcecode is to find some feature that looks like it might be easy to change and then work out how to do it - it's only by looking through the code that you'll start to figure out how each bit fits together.

DrSuperGood

QuoteIt's said that "Simutrans is preferably compiled as 32 bit binary!" so I was wondering if it's wise to compile a server as a 64 bit binary?
I'm trying compile Simutrans for client and server and both will be running on 64 bits machines.
Simutrans is 32 bit optimized. Specifically 64 bit data types are avoided with most types being at most 32 bits long. A few exceptions are for the fixed point arithmetic used by factories and things which use 64 bit integers to hold intermediate products to avoid introducing error. With the server there should be no problem at all net to a chance of slightly worse performance (probably trivial in most cases).

For the clients it is a big problem. Since Simutrans uses software rendered graphics (no direct GPU acceleration such as used by commercial games like Diablo III or on consoles like the Wii U) the graphic routeines were deemed performance critical sections of code. To maximize performance some cleaver people hand wrote x86 (32bit) assembly for these sections used in preference over the standard C/C++ code. Although there is good compile logic in place meaning it will compile, these optimizations cannot be used or are not optimized as x86 assembly is not compatible with x86-64 assembly too well (specifically there are issues with some addresses becoming mal-aligned as the assembly itself should technically be compatible but the product is not). As such 64 bit compiled clients or clients running on platforms other than x86 (32 bit) will perform considerably worse.

This is best seen with experimental.
Server -> 64 bit compiled as that is what Jame's server uses.
Client -> both available.
Me -> 64 bit build performs noticeably worse than 32 bit one on the server map achieving lower frame rates on average. Both work but 32 just works better. I use Windows.

QuoteSo here's my point: I was wondering how I could contribute and especially have a better understanding of the source code and it's mechanics (actually it's quite hard to understand a good million of lines you didn't write).
Writing code is only a small part of being a programmer. Software engineering is a lot more complex and as a computer scientist you would spend most courses learning about this rather than actual programming. I would go as far to say that computer science is about 1/4 programming and 3/4 software engineering. Sadly it is quite obvious that some of the Simutrans developers (or contributers) lacked software engineering skills.

As an example, both inputs and outputs of a factory share the same underlying data class (same methods and variables) yet perform functionally differently. Adding something like a demand buffer to inputs by modifying the class used for factory inputs adds a demand buffer to outputs which is never used. Due to the behavioural differences both inputs and outputs should have been given separate classes, even if those classes mostly extend a base class. Most of the manipulation of inputs and outputs is not even done in their class forcing most members to be public and instead is done in the factory class. In this situation they might as well have been declared as "plain old data" rather than a class since they are not taking advantage of any form of object orientated programming.

Ters

Quote from: DrSuperGood on August 26, 2014, 10:57:50 PM
As such 64 bit compiled clients or clients running on platforms other than x86 (32 bit) will perform considerably worse.

I don't think a 32-bit client on x86-64 will perform much worse than on a true 32-bit x86. In any case, the latter is probably not possible to buy anymore outside specialist markets. Maybe you didn't intend to suggest otherwise, but I just write this to clear up any potential misunderstandings.

Quote from: DrSuperGood on August 26, 2014, 10:57:50 PM
Writing code is only a small part of being a programmer. Software engineering is a lot more complex and as a computer scientist you would spend most courses learning about this rather than actual programming. I would go as far to say that computer science is about 1/4 programming and 3/4 software engineering. Sadly it is quite obvious that some of the Simutrans developers (or contributers) lacked software engineering skills.

As an example, both inputs and outputs of a factory share the same underlying data class (same methods and variables) yet perform functionally differently. Adding something like a demand buffer to inputs by modifying the class used for factory inputs adds a demand buffer to outputs which is never used. Due to the behavioural differences both inputs and outputs should have been given separate classes, even if those classes mostly extend a base class. Most of the manipulation of inputs and outputs is not even done in their class forcing most members to be public and instead is done in the factory class. In this situation they might as well have been declared as "plain old data" rather than a class since they are not taking advantage of any form of object orientated programming.

Don't be too harsh. The developers wanted, and still want, a game to play, not spend time creating a software design masterpiece. Bad software engineering happens all over the place, and games in general are not know for being best in class either. Maybe factory inputs and outputs were a lot more similar originally.

As a side note, the terms compute science and software engineering, or at least what they normally translate into, have different meanings around the world. In my education, programming sorted under software engineering, while computer science included the hardware side as well.

Markohs

 Have in mind that having a 64 bit executable it's going to be always slower than a 32 bit one for a simple reason, memory pointers are 64 bits and not 32, so just to start, you'll have to move twice the data from memory to CPU. Since most of the code uses just 32-bit values, it's not really worth switching to 64 bits, there is no other advantage in doing so.

Plus, as already said in this post, there are many optimizations in code and data structures that assume 32-bit, forcing a 64 bit build will neutralize this optimizations. The only advantage it can have is allowing bigger maps in simutrans, for the case memory usage aproaches 2Gb, but at those sizes, simutrans does not perform too well anyway.

It's been some time I don't look at current chips specifications, and it might be that even loading a 32-bit value from memory just fetches 64 bit instead anyway and cache the not used part in the CPU cache, but anyway even in this case a array of pointers (there are many in simutrans), will have double the size in 64-bit than in 32-bit, making the cache perform worse.

As to contribute: I'm not really much attached to development in simutrans now, I'll maybe come back to contribute code when I get excited about something I want to do, but iirc there were many open aspects of the code that needed improvement, and features that could be implemented. prissi/dwachs will maybe point you there as soon as they can. But I have some suggestions that you might find useful:

- Maybe you can explore the scrpiting system in simutrans (squirell) to make a interactive new players "tutorial of first steps", like "ok, build a road from here to here, ok, buiy a vehicle, ok, set schedule, ok, go, watch finance window... " . Just an idea. This is not so easy, because you might to implement or ask to implement some new UI functions like highlight GUI buttons, or mark in a special way the map and so no.
- You can try to fix the bugs posted in the forum and try to fix yourself (I think there are some still open)
- You can check the list of extension requests that are floaring around this forum and try to work on one of your choose.

Dwachs

Do not worry about 32bit vs 64bit in the beginning.

To start developing, my advise would be: From your playing experience find points in the program, where something could be improved. Then try to do that.

That is actually where I started programming: I was annoyed that the depot window forgot the last selected line, so I tried to improve that.
Parsley, sage, rosemary, and maggikraut.

DrSuperGood

QuoteI don't think a 32-bit client on x86-64 will perform much worse than on a true 32-bit x86. In any case, the latter is probably not possible to buy anymore outside specialist markets. Maybe you didn't intend to suggest otherwise, but I just write this to clear up any potential misunderstandings.
Slight misunderstanding. Most x86-64 platforms also support plain old x86 in the form of compatibility mode. As such performance is pretty much identical as it is running the x86 compile and using the optimized assembly. The problem is running the x86-64 compiled client which can only run on systems that support x86-64 since then it will not use the assembly.

QuoteIn my education, programming sorted under software engineering, while computer science included the hardware side as well.
Both are necessary for writing good programs. You cannot write good software unless you know how processors work (you can end up doing silly stuff that they are not efficient at executing) and neither can you write good software if you do not know all the different software structuring techniques (you end up writing a mess of code no one understands and there is a good chance an error is inside it).

QuoteHave in mind that having a 64 bit executable it's going to be always slower than a 32 bit one for a simple reason, memory pointers are 64 bits and not 32, so just to start, you'll have to move twice the data from memory to CPU. Since most of the code uses just 32-bit values, it's not really worth switching to 64 bits, there is no other advantage in doing so.
No memory pointers are not 64 bits. As you correctly raised the issue it would be twice as slow fetching data so to combat this they came to a compromise at around 40-48 bits. This is why 64 bit OS have a memory limit that is huge but nowhere near as crazily large as 64 bit pointers should support.


QuoteIt's been some time I don't look at current chips specifications, and it might be that even loading a 32-bit value from memory just fetches 64 bit instead anyway and cache the not used part in the CPU cache, but anyway even in this case a array of pointers (there are many in simutrans), will have double the size in 64-bit than in 32-bit, making the cache perform worse.
They are not double the size for that reason.

People are forgetting the speed benefits x86-64 brings. Specifically there are more registers available meaning less stack operations required as well as the registers being twice the size so instead of many instructions needed for 64 bit operations only a single instruction is. This will mean code such as used for factories will actually perform better in x86-64 since they rely on a lot of 64 bit types to perform correct fixed point multiplication and the extra registers are bound to reduce stack calls.

In theory the graphics could also be 64 bit optimized, potentially performing 2 pixel operations with a single instruction (half the amount of code run) however this depends on how rendering is implemented and still needs some mega smart assembly guru to write properly. It may also be the case that it needs an extended x86-64 instruction set which I recall provide the necessary operations which will raise system requirements.

Currently my biggest question is "How do I contribute?". A lot of documentation is old and references the source forge SVN which is purely pakset now as far as I can tell. I would really like to push JIT2 towards some kind of incorporation into the game (many people had positive feedback regarding it) but I am struggling to see how to start. Even getting it to compile on my visual C++ install takes several modifications to the build properties which I am sure is not a good idea.

Ters

Quote from: DrSuperGood on September 01, 2014, 05:36:35 PM
No memory pointers are not 64 bits.

Try sizeof(void *) and see for yourself. My 64-bit systems always returns 8 (bytes, so 64 bits). That the address bus is only 48 bits wide is a different matter.

Quote from: DrSuperGood on September 01, 2014, 05:36:35 PM
In theory the graphics could also be 64 bit optimized, potentially performing 2 pixel operations with a single instruction (half the amount of code run) however this depends on how rendering is implemented and still needs some mega smart assembly guru to write properly.

Since Simutrans graphics is 16-bit, it already does two pixels in one operation in many cases. And with SSE, you can even do 8 pixels in one instruction, even on 32-bit processors. (Modern GCC will do this for you if you define USE_C.)

DrSuperGood

QuoteTry sizeof(void *) and see for yourself. My 64-bit systems always returns 8 (bytes, so 64 bits). That the address bus is only 48 bits wide is a different matter.
It is also things like the translation look aside buffer, page management system, etc that all do not use 64 bit addresses. Compilers might choose it to be 8 bytes for convenience and that reading 6 bytes is not any slower than reading 8 bytes. I do not think using 64 bit will impact performance that much as some stuff is slower while other stuff becomes faster.

It would be nice if a build from scratch user manual was written. A lot of it tells you to go to other sites or implicitly assumes you know what to do. Simple things like the code to pull revision version before build does not work out the box and I had to manually write the header file to get the thing to build which clearly is not intended. I do not know if it is trying to pull from an old repository or if its just because I do not have a SVN tool installed (I pulled from git) however there really ought to be a fail safe comprehensive build step guide. Just as another example experimental uses a specific folder in the parent folder for headers and libraries where as standard simutrans expects them to be in the system class path, which is not the most straight forward to do at times for someone who never used visual c++ before. Additionally a guide for how to write commits would be nice as I am confused if you are using SVN or GIT since documentation all over the place refers to SVN but the code appears hosted on GIT.

kierongreen

The code is hosted on an SVN server (well actually a GIT server with an SVN frontend I believe?). The public GIT server is a mirror of the SVN - commits don't go to the public GIT server though. Only certain people have write access to the SVN server to commit code. Hence you create patches which are reviewed by one of the devteam who will commit if appropriate.

Ters

Quote from: DrSuperGood on September 01, 2014, 11:17:25 PM
It is also things like the translation look aside buffer, page management system, etc that all do not use 64 bit addresses.

Yes, yes, yes, but that's not the point. The point is that sizeof(void *) goes from 4 to 8. There exists no x86 or x64 instruction for reading just 48 bits from RAM, except perhaps for a few 32-bit far jump, call and return instructions, plus very special instructions like LGDT.

Quote from: DrSuperGood on September 01, 2014, 11:17:25 PM
Simple things like the code to pull revision version before build does not work out the box and I had to manually write the header file to get the thing to build which clearly is not intended. I do not know if it is trying to pull from an old repository or if its just because I do not have a SVN tool installed (I pulled from git) however there really ought to be a fail safe comprehensive build step guide.

If you pull from git, you've gone into unofficial territory. Although I'm not sure how this is supposed to work with the source code zip archive released with official versions. The revision thing is possibly disabled there.

Markohs

Yep, the bus might not actually use the full 64 bits but the compiler and cache do it, so the cache is affected as if it was 64. I don't doubt the extra registers and instructions  make algorithm compilation more optimal in many situations but I really doubt it's really worth it unless it's a computation intensive function. Just try it, the 64 bit is empirically slower.