News:

Congratulations!
 You've won the News Item Lottery! Your prize? Reading this news item! :)

Modern C++?

Started by jamespetts, February 03, 2018, 12:39:25 AM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

jamespetts

Quote from: prissi on February 08, 2018, 12:17:30 AM
nullptr is indeed the best choice, I would agree with that too. Again, needs to be all replaced, and not mixed.

This should be quite straightforward. Would you like me to upload a patch for Standard with a full search/replace for this?
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Ters

I like NULL, because it screams out to me, both by being in caps and being highlighted (and is shorter to type).

Dwachs

Quote from: Ters on February 08, 2018, 06:46:39 AM
I like NULL, because it screams out to me, both by being in caps and being highlighted (and is shorter to type).
same for me.
Parsley, sage, rosemary, and maggikraut.

prissi

We could just add to simconst.h §define NULL nullptr ...

Ters

I think NULL is already defined in some standard header. It is not good practice to redefine stuff from there.

jamespetts

Is it not better to follow the standard and use nullptr, even if it may seem less easy at first?
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

dodomorandi

Sorry for resurrecting a 9 months old thread. I would like some information about the idea of modernizing the codebase of simutrans to C++11. I noticed that some people would be happy to have better zero-overhead abstraction in simutrans, but others looks like pretty against it.

To make things easier, I make some direct questions:

  • Will it be accepted the introduction of C++11 code? I understand that there are reasons about sticking to C++98 that are independent to code quality, maintainability and portability, and in these cases I cannot (and I don't want) to convince anyone about using C++11
  • How the development should be handled? I generally would open a git PR, but I don't know if with simutrans this is the normal procedure. Probably a C++98->C++11 'port' would create many small patches instead of some bigger ones, in order to ease the code review from other devs, IMHO source control is the best approach, but if you prefer using path files (using the forum) it is not a problem.
  • Would you accept "just the patch", or you would prefer/require the writing of a unit test before the C++98->C++11 transition of something, then the patch itself? That is really useful to grant the correct behaviour of the new code and it will improve simutrans in general. On the other hand, this will slow down the porting, and obviously unit tests in general require maintenance.
  • I have seen discussions about the "performance cost of C++ abstractions". If this is a concern, do you eventually want benchmarks and asm comparisons of the parts that are being ported to the new standard? I think that the generated code speaks much louder than speculations.

Sorry if my words feel a bit harsh, I really don't want to be offensive or aggressive. I am just pragmatic. I would really love to improve simutrans, and I really thing that modernizing the codebase is one important step that will be necessary to improve the game.

DrSuperGood

QuoteI would really love to improve simutrans, and I really thing that modernizing the codebase is one important step that will be necessary to improve the game.
Modernizing the code base would make maintenance easier, but chances are the game will not be improved directly by it.

In the end there is not too much to be gained by moving to C++11 which is why the change has not really happened. Sure one occasionally runs into a more modern C++ language feature that has to be patched away, but generally C++98 does everything one wants to do.

That said there are some more modern C++ features that could be useful. For example the standard file system support should make Unicode file name support less platform specific, bit that needs C++17. There are also options for multi-threading other than PThread so one could finally do away with optional multi threading support by making it mandatory.

prissi

Btw. Simutrans even predates C++98 be some years, although it used GCC features in the beginning, since only GCC (or its DOS port) could handle the large memory of it.

However, changing things just for the sake of change is usually not a good idea. If it fixes a problem, fine. But there are some templates and list objects, which occur in every tile. Having a little overhead is fine, but having 100 Million of extra bytes is not. Same for memory handling. A normal alloc is relatively slow and carriers overhead (between 4 and 255 bytes). Again, if you need many in pre-determined size, then own memory management makes sense. So you will have always some own construct pointing out, and these are often in the most critical part of the code.

Using std::vector versus the simutrans template does not affect function very much, and could be easily done. But both are more or less bug free, so personally I do not care which to use. I am rather worried about how and exchange would clash with larger patches and things like Experimental versus standard and so on.

But we are a very small group of contributors. SO if C++11 would enable more contributors, and it is on non-critical part of the code. The nightly build server uses a stable Debian, which uses GCC 4.9.x so this should also work with C++11 (unless some garbage collection is wanted).

Ters

There pretty much isn't a thing as rewriting the code into C++11. With later versions, replacing deprecated, or even completely removed, features is more of a thing in general, but can not remember Simutrans using such things in the first place.

Replacing Pthread with C++ threading would eliminate the Pthread dependency when using Microsoft's compilers, but last I checked, C++ threading support varies between Mingw64 builds. (For some odd reason, my cross-compiler uses an incomplete implementation based on native Win32, while the Mingw64 I have for Windows has a full Pthread-based implementation. It can't statically link Pthread, though, but that is a problem regardless of whether Simutrans uses the threading part of the standard C++ library or not.)

Furthermore, there have been indications that Simutrans is popular among people who do not have the most up-to-date computers. What might amount to insignificant overhead on a new computer may not be so on a 10 year old computer.

What might make more of an impact in terms of maintainability, at least in the eyes of new developers, is removing non-standard things for which there exists equally performant standard things, such as many of the container templates. It is important to remember that the standard library contain generic solutions to common problems. There are situations where they are not suited, but that requires profiling. Unfortunately, profiling on different systems and/or compiled with different compilers, including different versions, may yield different answers.

And to really bring the game to the next level, in particular to be able to leverage hardware accelerations, or do multiplayer in a less hackish way, one needs, in my opinion, to completely shatter the core of the game and rebuild it. It is not a matter of switching programming language version, but switching out the architecture and some design principles. Unfortunately, that is not something I have the time or energy for beside work.

prissi

If one really want to invest time, then getting Simutrans on Android is probably the best use of that ... THe GUI is changing now into a fully scalable version with thumb support, so it will be better suited for it today than simutrans was in the past.

dodomorandi

Ok, maybe I needed to be clearer on my point: I said that C++11 will make the game better does not mean that it will make it faster. Using "modern" capabilities allows to create better and safer abstraction, much more safer and compiler friendly than using "old approaches".

A really silly example: simutrans is full of manual allocations. And if you compile using a leak sanitizer you will notice that just opening and closing the game creates leaks. I totally know that leaks are not UB, but on the other hand it is not a really good practice to leak memory around. The part related to the password manager does not use memset_s (which is C11/C++11) not volatile to clear data, creating a possible information leak.

What I am saying is that a code hard to mantain is hard to evolve. The removal of the pointers (using lvalue reference, rvalue reference and std::unique_ptr, for instance) is a great improvement by itself. And I am not saying that this is true because I am saying that, but based on the experience of many many C++ experienced developers (half of the seminars from CppCon are about good practices and code modernization).

Said that, I ask for some more specific replies. I will not be offended if you say "no, no C++11 here". But I would like to understand if I can contribute in simutrans modernizing the code during my spare time.

Dwachs


Quote from: dodomorandi on October 30, 2018, 08:55:46 AM
The part related to the password manager does not use memset_s (which is C11/C++11) not volatile to clear data, creating a possible information leak.
Could you provide a patch to address this?
Parsley, sage, rosemary, and maggikraut.

dodomorandi

Quote from: Dwachs on October 30, 2018, 10:23:54 AMCould you provide a patch to address this?
Yes, it can be done using std::fill using static_cast<char volatile *> for both begin and end arguments. However, my point is this one of the many cases in which the usage of C++11 could make things simpler, more idiomatic and easier to maintain.

If you tell me that we can go with C++11, the patch is just replacing memset with memset_s. Depending on what is decided about the modernization discussion, the patch will be different. In any case I will be happy to send it even if the decision is to stick to C++98. I think that I have got a couple of other minor issues, but this is another story  ;)

P.S.: there was a typo:
Quote from: dodomorandi on October 30, 2018, 08:55:46 AMnot volatile
was meant to be "nor volatile"

prissi

Replacing pointer by overlaid types will be a big problem if you (like in Simutrans) allocate 100 million of objects. How would you handle an objectlist without pointers?

volatile has nothing to do with security. It just means the compiler should load something from the memory and not the cache, but the compiler is free to ignore this. Also, for a non multithreading program like simutrans (the display and few map routines aside), volatile is useless in dialogues and gui handling.

Also std::fill versus memset versus memset_s? All have the same functionality. Also memset_s and std:fill relies on passing the size (end), so I fail to see the improvement versus memset: If you feed the wrong size to memset, then you can do the same to memset_s ... (Also, while the Debian compiler of the nightlies is C++11 compliant, the libstdc++ is not fully that. One really has to check what is supported.)

memset( a, lengthof(a), 0 ) versus std::fill( static_cast<char volatile *>a, static_cast<char volatile *>a+lengthof(a), 0 ) is just typing more letters to me. But maybe I misunderstood this, because I am not a computer scientist and my self taught IT education is 20 year out of date.

Opening and closing a game does not create leaks, at least we ocassionally check with valgrind. Loading a pakset creates leaks, due to the way the pak system is implemented, but that cannot be helped without a major redesign. And it is not an issue, since paks are only loaded once.

Could you be more specific, where the issue with the password_frame.cc is (or which file are you referring to?) Simutrans does not store passwords on the client side, so I am not sure, if there is a vulnerability. Of course, if you send enough garbage to a Simutrans server or client, you can crash it, because the core engine was never built for abuse (because it was a single user core). But this is totally independent from C++11. However, since Simutrans is cross platform and less than 20 clients and servers are online at any time, and servers are usually hand compiled with different compiler versions and architectures, I doubt you can run a sucessful exploit (versus just crashing the program).

Having said that, security improvements are welcome. This had never been foremost, since the networkcode came very late to Simutrans.

Finally, a personal comment: I am supporting Simutrans for about 14 years now. I had seen in that time some trends come and go. Granted, a few new stuff (like automatic for loops and working templates from C++98) are quite welcome. But just coughing "::" all over the place to replace working code is a job, I happily delegate to other. Depending on your IDE, too much C++ can make debugging harder, since a lot of stuff is hidden when using virtual object and automagic conversion and the like. (The pak loader is a good example of code almost undebugable with MSVC ... )

dodomorandi

Quote from: prissi on October 30, 2018, 02:22:27 PMReplacing pointer by overlaid types will be a big problem if you (like in Simutrans) allocate 100 million of objects. How would you handle an objectlist without pointers?
Once you have abstractions, you will never need to handle raw pointers. It is obvious that at the end of the abstraction layers a pointer to something is handled, but there is a huge gap between doing manual memory management and using safe abstractions.

Quote from: prissi on October 30, 2018, 02:22:27 PMvolatile has nothing to do with security
I did not say what I was pointing at because it was pretty superfluous in this discussion, but let's discuss it. Line 16 of network/pwd_hash.h, the method clear() use the MEMZERO macro to set hash to zero bytes. MEMZERO is indirectly defined as a call to memset, which can be optimized away by the compiler if no reads of the memory are performed after the set operation. memset_s avoids this behaviour, and prior to C++11/C11 it is possible to cast a pointer to a type to a pointer of a volatile type in order to unallow the optimization.

Quote from: prissi on October 30, 2018, 02:22:27 PMIt just means the compiler should load something from the memory and not the cache, but the compiler is free to ignore this
No, volatile means that the compiler cannot assume that the value is constant between two subsequent accesses, and cannot ignore it (otherwise it would be impossible to use memory mapped devices)

Quote from: prissi on October 30, 2018, 02:22:27 PMfor a non multithreading program like simutrans
Multithreading is completely unrelated to volatile. http://isvolatileusefulwiththreads.in/C++/ volatile access to memory does not mean atomic access.

Quote from: prissi on October 30, 2018, 02:22:27 PMAlso std::fill versus memset versus memset_s? All have the same functionality
memset and memset_s can only handle raw memory, std::fill can be used with any iterable object, without the needs of contiguous memory. And at the same time it is optimized to use memset in case of trivially copy assignable contiguous data. In the specific case of pwd_hash it is the same as using memset, until you change the type of the hash variable.

Quote from: prissi on October 30, 2018, 02:22:27 PMSimutrans does not store passwords on the client side, so I am not sure, if there is a vulnerability
Maybe not. I end up to that piece of code by chance, I noticed the clean method, the fact that a a password hash was stored, I just made the assumption that that memory wanted to be zeroed, without any chance of let the memset being optimized away.

Quote from: prissi on October 30, 2018, 02:22:27 PMwhile the Debian compiler of the nightlies is C++11 compliant, the libstdc++ is not fully that
https://packages.debian.org/stretch/libstdc++6 The stable release of Debian fully support C++11: libstdc++ is 6.3, and AFAIK it also fully supports C++14. But maybe I may be wrong, can you provide a link that shows that GCC 6.3 + libstdc++ 6.3 are not able to compile valid C++11 code?

Quote from: prissi on October 30, 2018, 02:22:27 PMOpening and closing a game does not create leaks
Yes, it does.

Direct leak of 39000 byte(s) in 975 object(s) allocated from:
    #0 0x7f0884e57d29 in operator new(unsigned long) /build/gcc/src/gcc/libsanitizer/asan/asan_new_delete.cc:90
    #1 0x55e8eba51d06 in obj_desc_t::operator new(unsigned long) descriptor/reader/../obj_desc.h:29
    #2 0x55e8eba50d1e in image_t::copy_image(image_t const&) descriptor/image.cc:54
    #3 0x55e8eba515a2 in image_t::copy_rotate(short) const descriptor/image.cc:96
    #4 0x55e8eba53f9e in create_alpha_tile descriptor/ground_desc.cc:104
    #5 0x55e8eba58be8 in ground_desc_t::init_ground_textures(karte_t*) descriptor/ground_desc.cc:814
    #6 0x55e8ec31ee1b in karte_t::karte_t() /home/user/src/simutrans/simworld.cc:2053
    #7 0x55e8ec281d34 in simu_main(int, char**) /home/user/src/simutrans/simmain.cc:1238
    #8 0x55e8ec2ae462 in sysmain(int, char**) /home/user/src/simutrans/simsys.cc:1036
    #9 0x55e8ec3c777e in main /home/user/src/simutrans/simsys_s2.cc:786
    #10 0x7f08842f5222 in __libc_start_main (/usr/lib/libc.so.6+0x24222)

Direct leak of 9364 byte(s) in 157 object(s) allocated from:
    #0 0x7f0884e56019 in __interceptor_malloc /build/gcc/src/gcc/libsanitizer/asan/asan_malloc_linux.cc:86
    #1 0x55e8ec284b04 in xmalloc(unsigned long) /home/user/src/simutrans/simmem.cc:156
    #2 0x55e8ebb3310c in recode dataobj/translator.cc:138
    #3 0x55e8ebb35c1d in load_language_file_body dataobj/translator.cc:335
    #4 0x55e8ebb3618e in translator::load_language_file(_IO_FILE*) dataobj/translator.cc:379
    #5 0x55e8ebb373f1 in translator::load(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) dataobj/translator.cc:446
    #6 0x55e8ec27ffb8 in simu_main(int, char**) /home/user/src/simutrans/simmain.cc:1030
    #7 0x55e8ec2ae462 in sysmain(int, char**) /home/user/src/simutrans/simsys.cc:1036
    #8 0x55e8ec3c777e in main /home/user/src/simutrans/simsys_s2.cc:786
    #9 0x7f08842f5222 in __libc_start_main (/usr/lib/libc.so.6+0x24222)

Direct leak of 672 byte(s) in 28 object(s) allocated from:
    #0 0x7f0884e57d29 in operator new(unsigned long) /build/gcc/src/gcc/libsanitizer/asan/asan_new_delete.cc:90
    #1 0x55e8ec1abde8 in stadt_t::cityrules_init(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) /home/user/src/simutrans/simcity.cc:393
    #2 0x55e8ec280208 in simu_main(int, char**) /home/user/src/simutrans/simmain.cc:1060
    #3 0x55e8ec2ae462 in sysmain(int, char**) /home/user/src/simutrans/simsys.cc:1036
    #4 0x55e8ec3c777e in main /home/user/src/simutrans/simsys_s2.cc:786
    #5 0x7f08842f5222 in __libc_start_main (/usr/lib/libc.so.6+0x24222)

Direct leak of 474 byte(s) in 47 object(s) allocated from:
    #0 0x7f0884d9ef01 in __interceptor_strdup /build/gcc/src/gcc/libsanitizer/asan/asan_interceptors.cc:405
    #1 0x55e8ec289bd6 in tool_t::read_menu(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) /home/user/src/simutrans/simmenu.cc:721
    #2 0x55e8ec280d89 in simu_main(int, char**) /home/user/src/simutrans/simmain.cc:1104
    #3 0x55e8ec2ae462 in sysmain(int, char**) /home/user/src/simutrans/simsys.cc:1036
    #4 0x55e8ec3c777e in main /home/user/src/simutrans/simsys_s2.cc:786
    #5 0x7f08842f5222 in __libc_start_main (/usr/lib/libc.so.6+0x24222)

These are some of them. And as I already said, the fact that leaking is not UB (or it only happens once) does not mean it is a good behaviour.

Quote from: prissi on October 30, 2018, 02:22:27 PMtoo much C++ can make debugging harder,
I totally don't agree with this: you will never need to step into stdlib, and using standard functions correctly makes code safer and most of the time faster. And, honestly, saying that a C++ project could have too much C++ looks like a weak argument.

Again, I would like to avoid this kind of discussion. I just asked if the main developers consider acceptable moving from C++98 to C++11 as a requirement. I don't want to force anyone to do anything, the reason I am asking is because it does not make any sense that I create a patch to have idiomatic C++11 code and it is rejected because it is C++11.

Ters

C++ has gotten lots of fancy features since the turn of the millennium, but the fact that they "always" want to implement it partially as templates in the standard library rather than directly as part of the language syntax has caused a lot of long-winded constructs. auto helps somewhat, but not always.

memset_s doesn't really have much of an edge over memset in my opinion. In most cases, size and count is the same and/or size is not passed to the place memset is called in the first place. std::fill has the advantage over both in that you don't need to convert element count to byte count yourself. The only disadvantage is that iterator arithmetic, needed to calculate end, isn't the most understandable thing to a non-experienced C++ programmer. std::fill_n is better at that.

I don't think passwords being stored on the client side is much of an issue. If something is able to suck that information out of memory, it would also be able to intercept the keystrokes, which is probably easier. Grabbing passwords from the server's memory would also be more tempting, as one can get more passwords that way.

Fast, maintainable, secure code is still a fantasy silver bullet, except for the most trivial of cases, although they are by no means mutually exclusive.

jamespetts

The pointers issue is an interesting one. Raw pointers are used extremely heavily in Simutrans. Assuming that there is no performance impact at all of switching to newer abstractions (is this assumption correct? It would be interesting to see it tested), there would be a very large amount of work required to go back and change all the existing code, especially when an awful lot of code is something like:


if(pointer)
{
   pointer->do_a_thing(variable);
}
else
{
   do_something_else();
}


and quite a lot more code uses arrays and pointer arithmetic.

What I had originally suggested in starting this thread was using the new semantics for new additions to the code and retrospectively changing limited parts of the code that are easy to change (e.g. NULL to nullptr), although even the latter was not accepted in the end.

However, we have at least established that C++11 language features can safely be used in Simutrans if necessary and do not violate Simutrans coding standards.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

DrSuperGood

#53
QuoteYes, it can be done using std::fill using static_cast<char volatile *> for both begin and end arguments. However, my point is this one of the many cases in which the usage of C++11 could make things simpler, more idiomatic and easier to maintain.
Casting a non-volatile type to volatile is ignored due to it being complete nonsense. Especially if it is a pointer...

Volatile is a storage modifier. It is needed when a variable is declared. Typecasting non-volatile to volatile will do absolutely nothing as the compiler already knows the variable is non-volatile. Compilers will warn you this if you set the appropriate warning level.
QuoteThe part related to the password manager does not use memset_s (which is C11/C++11) not volatile to clear data, creating a possible information leak.
I am not even sure what you are referring to. All passwords are stored on the server, with the server receiving the password entered by a client to authenticate. If the server is compromised then all passwords are compromised anyway as the password file can be accessed directly. In the case of clients the biggest thing that can happen is that if their system is compromised their password is revealed to the hacker. That said who in their right mind would hack Simutrans passwords...

Technically a lot more is required than simply secure setting to zero. For the password to be safe spectre mitigation is also required but good luck with that... This is because on some processors another application can guess the contents of memory it does not have access to by measuring the time taken for a security exception to be raised representing memory state and even content state due to speculative execution.

Also memset_s is already a secure memory set function as it makes the guarantee that after it is returned the memory will have been set. Compilers cannot optimize it away like they can with normal memset. No need to cast anything to volatile, if it ever made sense to do so to begin with.
QuoteGrabbing passwords from the server's memory would also be more tempting, as one can get more passwords that way.
Or they could just look at the completely unsecure password hash file that accompanies the save. No need to even look at the running servers memory if the server is that compromised...
QuoteThese are some of them. And as I already said, the fact that leaking is not UB (or it only happens once) does not mean it is a good behaviour.
Opening and closing Simutrans does not create memory leaks. This is because the OS discards all virtual memory pages exclusively owned by an application on application closure anyway. Sure, logically there might seem to be a leak but in reality no memory is leaked by the application as the application no longer exists after exiting and all its exclusive memory pages are discarded anyway. This might not always been the case using languages like C and C++ due to some platforms not having a concept of application or virtual memory, but as far as I am aware Simutrans does not support any such platform.

The OS will not automatically free any handles owned by an application on application closure so those can be considered potential leaks. Like wise resources within specific APIs, eg Vulkan handles, are not automatically freed on application closure.

If you are referring to reloading maps then yeh there are possibly a few minor ones. That said the major ones I discovered in Experimental (now fixed) had nothing to do with code shared with Standard.

prissi

memset cannot be optimised away in said function, because acting on its own class data (unless by a broken optimiser). Because most routines that modify their class data do not read them back. If you refer to this compiler bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=8537 than this is that: A compiler bug, and usually we try to ignore those. (Same for the workaround in that thread with volatile; and thanks, I know volatile must not be used for threading, since I had done my share of embedded programming, where you use it for hardware access.) Anyway, DrSupergood gave a good answer, why even if this bug would be present (which is not) it would not make a vulnerability.

I want to see an abstract pointer that does not imply a lot of overhead. std::unique_ptr may either allocate on the stack (but then we would run out of stack quickly) or is via the std memory allocation implementation which is very slow (about 100x slower for objects like trees or tiles which we have plenty) and has a certain overhead (std allocation tend to allocate always multiples of 16 or even 256). Also allocation 1000 Millions or bjects can cause some implementations to run out of memory too (albeit that is nowadays a thing of the past). Allocationg a 1024*1024 map with standard allocator is taking almost minutes versus 3-5 seconds using our tailored routines.

Another example are the lists: In the stl these are double linked. This implies quite an overhead (extra 8 bytes, and twice the effort on unliking), which we do not need.

Going back to C++11. There are neat things like the for loop; but a change of the end iterator during the loop is undefined. We do this on more than one case. So we cannot use simple C++11 code there, even though it would look nicer than the "FOR()" macros.

Don't get me wrong, I use C++11 feature for normal "quick and dirty" programming, and I am not fundamentally opposed against. However, a lot of code in Simutrans has been optimised and reduced to its minimum, because it allows people to play larger maps with more things. It is a little like assembler versus high level languages. You can be usually faster in assembler, but you only do it when needed. So we have tailored structures, which are faster and more compact than more general ones. We have also some normal use of them, because it was more consistent.

One could of course change the non-critical parts with more C++11, and that is slowly done (as shown by the slowly replacing of some other older string types by std::string in Simutrans). (NB: std::string is also "nice" if you trace through it, which lots of contructors going before you finally see the intermediate result you would like to see).

About the leaks you see there, I also go with DrSuperGood. I am programming stuff since 1986, and any OS worth its name, releases all memory at the end of execution unless explicitely allocated as non-releaseable. The only leaks I am concerned are from (re-)loading a game, which happens frequently on a server. Since the standard server can run for two months without a big memory increase, I think there are no big leaks right now.

DrSuperGood

#55
Quote(Same for the workaround in that thread with volatile; and thanks, I know volatile must not be used for threading, since I had done my share of embedded programming, where you use it for hardware access.)
If volatile can be used for multi threading is entirely down to the platform as the standards clearly state it cannot as it does not fit in with the multi threaded memory model. One either needs to use specific atomic types, or use locks and other synchronisation structures to enforce consistency and ordering when dealing with multiple threads.

The main and pretty much only use for volatile is accessing memory mapped peripherals and control registers in low level programming. This is encountered in driver creation as well as embedded controller programming and often the appropriate compiler will make guarantees as to how volatile works, especially when it comes to bitfield manipulation as some platforms have dedicated instructions to do that. Embedded processors might have only a single core and hence there always is a well defined and ordered memory model without the need for synchronization structures or atomic types.

The main advantage to be gained from newer C++ standards would be with C++ native Unicode support and File System support (C++17). This could potentially make the IO code more platform independent and remove many of the reinvented wheel like pieces of code that have been written to process Unicode and file paths.

Ters

Quote from: DrSuperGood on October 31, 2018, 12:53:20 AMThe OS will not automatically free any handles owned by an application on application closure so those can be considered potential leaks.
On Windows at least, handles to OS resources will be closed on process exit. (Which is why terminating processes is quite safe, while terminating threads is not.) I suspect that same is true on any operating system having process isolation, because it is the only way to protect the rest of the system from crashing or otherwise misbehaving programs in the long run. Handles to shared resources from elsewhere is a different matter, although resources on the same machine should preferably handle this just like the OS, while external resources should have a timeout.

Quote from: prissi on October 31, 2018, 05:07:09 AMYou can be usually faster in assembler, but you only do it when needed.
I'm not at all convinced this is true anymore, unless you are targeting very specific processors or even entire machines. C, and by extension the C++, as languages are by design a very thin abstraction layer over assembly, and compilers have become very clever at optimizing. It is however still the case that you can sometimes sacrifice development speed for increased runtime speed by throwing away abstraction layers. One doesn't even have to switch language. (In that regard, I recently wrote some insanely complicated C++ template code to auto-generate some boilerplate raw byte stream manipulation that I until then had written explicitly each time. The compiler did realize exactly which machine instructions I wanted it to end up with, although I will probably have a hard time understanding the template code myself now. Some quirks in C++ might have caused some duplication of code in the binary output, though. Due to the nature of the project, I haven't bothered looking more into that.)

Quote from: prissi on October 31, 2018, 05:07:09 AM(NB: std::string is also "nice" if you trace through it, which lots of contructors going before you finally see the intermediate result you would like to see).
std::vector would be a better example, I think. There is little reason to delve into std::string, but an std::vector can contain elements with custom constructors, assignment operators and move operators that needs looking into. However, even more of an issue is the compiler error you get when messing up some things. Your error usually propagates into some internal implementation types before finally failing, giving you half a page of error messages with little connection to the C++ standard. Nothing C++11 about that though, it has been that way since way before that.

Quote from: jamespetts on October 31, 2018, 12:39:09 AMRaw pointers are used extremely heavily in Simutrans. Assuming that there is no performance impact at all of switching to newer abstractions
If there is none at all, there is a question as to what the abstraction provides in the first place. std::unique_ptr is perhaps the only thing I can think of. References is a nice way signify un-owned references, but they can not be null and they can not be changed once set, which in turn makes them uneasy member variables. (They shine as parameter types in a functional programming approach.) std::shared_ptr has overhead both in speed (probably insignificant) and size (for Simutrans, likely quite significant in many places).

Quote from: jamespetts on October 31, 2018, 12:39:09 AMand quite a lot more code uses arrays and pointer arithmetic
Those things are still fundamental to modern C++, although pointer arithmetic has been generalized to iterator arithmetic. (I'm not sure what arrays have been generalized into, but std::vector belongs to it. Probably std::map as well.)

dodomorandi

Quote from: jamespetts on October 31, 2018, 12:39:09 AM
However, we have at least established that C++11 language features can safely be used in Simutrans if necessary and do not violate Simutrans coding standards.

Ok good. I still have some questions:

  • As I already asked before, what do you think it could be a nice way to handle the development? I think that a diff-and-patch method would be heavyweight in this case, and I think that it would hard for reviewing. As I said, my approach would involve using git and PRs, but I think that it is not doable in the current context, isn't it? What are your suggestions?
  • My idea is to write a unit test for a function I want to change, because I don't want to break any code during the process. Do you have any preferences about the handling of tests? I recently watched a CppCon talk of Phil Nash talking about Catch/2, a very nice testing framework: it is one single header file and it is incredibly easy to setup and use compared to other frameworks. I generally don't like adding a dependency just for testing, but in this case the solution is clean and there is no need to reinvent the wheel.
  • About coding standards: there are many patterns in C++11 that does not exist in C++03/98, rvalue references, variadic templates, noexcept and constexpr for instance. All the features that are from the C++11 standard cannot be obviously covered in the C++03 coding standards. Can I the c++ core guidelines (https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines) to cover these cases?

I see that most people here think that abstractions have costs. My idea is that I cannot assert that some code has zero cost abstraction or the opposite without proofs. My question is: how can I give you these proofs? I have two major concerns:

  • To have a comparison, I need two versions of the same code. Even if, for instance, I create a commit with both C++03 and C++11, do whatever is needed for the comparison and then I make another commit without the old code, the comparison can only be made on a single commit. And this is obviously not a good solution. However, I don't have any better idea in mind. What do you think?
  • There are two main ways to compare the effects of two different versions of the same code: microbenchmarks and assembly comparison. The former is only useful to avoid significative regressions (I don't expect major performance gains for most of the time), the latter is a bit hard to just show. One possible approach could be the following: I create the new version of a function, I took both the old and the new ones, I put them on compiler explorer and on quickbench (obviously creating a proper benchmark test), then I create permalinks and add them in the commit description/patch notes/whatever. What do you think?

jamespetts

I should note that I am a developer of the Extended branch, but not a developer of the Standard trunk, so I cannot comment authoritatively on how best, if at all, to go about seeking to about modernising the code (and it is better to minimise the non-functional code differences between Standard and Extended, so it would not be sensible to modernise the code shared with Standard on the Extended branch except by merging in any modernisations in the equivalent parts of the Standard code).

As to testing different versions of the code for performance loss or gain, you could use profiling: the community edition of recent versions of Visual Studio, which is used for Simutrans development, has in-built profiling, and one might alternatively use profiling builds in GCC. Of course, only one type of change should be tested at once.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

dodomorandi

Sorry I did not mention about profiling. I tend to use TDD, and the only way of doing it correctly is checking perfect code coverage through profiling. Using it for performance comparison is pretty nice too (even if I generally prefer perf), but my main concerns about showing the data are almost the same unfortunately.

I completely agree on your points about the modernization: I would like to make changes to the standard branch that can be directly merged in the extended branch, in order to maximize the results minimizing the efforts.

prissi

If one really want to modernise Simutrans, a complete rewrite is probably easier than starting with the current code. There are many different styles in here, from the more recent C++ like approach to put everything in a separate two line function to the veteran C approach to do everything in one function and all variations in between. If you aim for a code, as you probably imagine, then this means more or less touching everything. In the end, I think, such a major change needs to be actively supported by somebody who know the code well.

About testing units. We have had these, but since Simutrans is so diverse (15+ pak sets) automated testing is probably only possible for the most simple cases. Most of the easy bugs have been caught in the last 20 years ... Honestly, I have no formal education in informatics, and when I finally was thrown at Simutrans, I had to learn a lot very fast. (No prior C++ experience, only C, Pascal, Modula2 ... ) By the way, Simutrans does not use exceptions, because they clash with pthreads. And they have a lot of overhead (we profiled a few years back).

I have also to admit, that personally after every such big patches in the past I had to find out things again, which I know before. So the older the coder, the more conservative probably. The privilege of youth is to ignore me therefore ...

We do not use git because it did not exist when simutrans had already in an svn. (Albeit we lost the first one, as one can see from the current archive.) Moreover, svn was easier for beginners (on a command line). Gitless is of course something trying to mend that. Finally, the linear revision makes it easier to know if things have been fixed or not. If something is fixed in r8601, then it is fixed also in revisions larger 8601. Since we rely on bug reporting by normal players, this is helpful.

Having said that, personally, as an everyday programmer (not for money, but for the tools I need to work in semiconductors), I would rather wish for a basic portable GUI standard library as part of C++ instead some abstract features which makes the straight-forward craft of programming into an intellectual abstract exercise. Because most of the time, when I program something, I spend on the GUI (despite being very fluent in Win32 and frameworks), and that is not even portable. So for me C++2x should include a portable minimalistic GUI standard libary, instead garbage collectors. That would save a lot of time. Yes, and for loops where the end can be modified without being "undefined". For a language, having undefined is really bad, because such code will be used by someone for sure (unless a big compiler warning).

Sorry for drifting quite offtopic.

dodomorandi

You made many good points, I will try to reply to them one by one.

Quote from: prissi on October 31, 2018, 12:42:04 PMa complete rewrite is probably easier than starting with the current code
Cannot say that you are right, but I think that is true as well. Unfortunately, this would create a complete new project, splitting resources instead of merging them together. The idea of modern refactorization was related to the idea of making changes easy to understand from someone who already knows the code.

Quote from: prissi on October 31, 2018, 12:42:04 PMuch a major change needs to be actively supported by somebody who know the code well
This is the reason I would like to follow a sort of leaves-to-branches modernization, in order to deep into the project step-by-step and, at the same time, perform real improvements.

Quote from: prissi on October 31, 2018, 12:42:04 PMAbout testing units. We have had these
That's good to know! I am sorry I totally missed that.

Quote from: prissi on October 31, 2018, 12:42:04 PMBy the way, Simutrans does not use exceptions, because they clash with pthreads.
I am quite happy about this. I am able to write containers with strong exception guarantees, but I hate doing that, it is a huge pain. Moreover, it is not just a matter of pthreads: throwing an exception in a std::thread is a perfect footgun, and writing perfect-forwarding noexcept declarations is still a lot of boilerplate.

Quote from: prissi on October 31, 2018, 12:42:04 PMI had to find out things again
I would really like to avoid that. What I would like to obtain is something like "oh, this function used to take a char *, now it takes this st::string_view... fine". (Just to make an example of abstracting away things)

Quote from: prissi on October 31, 2018, 12:42:04 PMWe do not use git because it did not exist when simutrans had already in an svn.
Yes, I know that  :D I just don't know how you generally handle contributions in which there are many patches instead of a single, big one.

Quote from: prissi on October 31, 2018, 12:42:04 PMI would rather wish for a basic portable GUI standard library as part of C++ instead some abstract features which makes the straight-forward craft of programming into an intellectual abstract exercise
The fact is that creating a brand new GUI requires a lot continuous development. On the other hand, performing modernization is much more step-by-step. I can decide to modernize one function, writing a unit test and sending a patch for that, everything (maybe) after dinner. It is not possible for me to do that for bigger features. I know that probably the total time is much bigger for the modernization, but I can manage the time in a better way  ;)

Quote from: prissi on October 31, 2018, 12:42:04 PMSo for me C++2x should include a portable minimalistic GUI standard libary
There is a paper that, unfortunately, have been rejected (for now). There are lots of things that would be nice in C++ (i.e. heterogeneous computing), but remember that there are already a lot of useful things, but if we stick to the old standards we are leaving unused lots of efforts made by the community  :)

Quote from: prissi on October 31, 2018, 12:42:04 PMYes, and for loops where the end can be modified without being "undefined". For a language, having undefined is really bad, because such code will be used by someone for sure (unless a big compiler warning).
I have to disagree with this. Undefined behaviour must exists, otherwise you cannot do what you want. Ok, it really sound strange, but follow me with the following examples. You can dangle pointers, but deferencing them is UB. If you define the behaviour of deferencing dangling pointer, it means that the compiler must put additional code to check whether that pointer is valid, and add other code to unwind the stack or whatever. Another example is overflowing signed integers, which is UB. You can easily check that performing a multiplication between ints is much more efficient that doing that between unsigned ints. And this is only because overflowing ints is UB and overflowing unsigned ints is perfectly defined behaviour.
Said that, give a look to Rust when you have some spare time, I think that you will find a wonderful language.

jamespetts

There is, I think, certainly a worthwhile intermediate point between, on the one hand, doing nothing to the code, and, on the other, rewriting it from scratch (the latter of which is not realistic given current coding resources). That intermediate position is modernising in the existing code those specific things that are relatively easy to modernise without major rewrites (e.g. NULL to nullptr), in addition to writing any new code or parts of existing code that is having a major rewrite for other reasons (e.g. to fix a bug or add a feature) using more modern features and practices.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

prissi

I can live with undefined behaviour in C, since this is like undefined behaviour is assembler (it is still a very thin layer, especially with some nice asembler like MC68k ) But C++ should not allow such things, i.e. offer a save_int with a defined behaviour on overflow and so on. Especially fundamental parts like a for loop should be deterministic in any situation, either the iteration is already determined at start or it is flexible. Or such features are useless, because who know what a function does added later in a different module, and one has to use own code.

Dereferencing unassigned pointers is also something, which is ok for C but should be defined for C++. Even MSVC assigns all pointer in debug mode to an undefined address, and having a defined behaviour (even as compile switch or pragma) would be very helpful for a high level language. Undefined behaviour is extremely dangerous, because it tends to work for some time until everybody forgot it was there. (Like volatile worked most of the time with threads, but we have been there.)

Anyway, these are just my rambling on where I would like to modernise C++. But it seems there are rather tons of stuff added to libraries than making the code more deterministic by design.

About testing, that was fully removed around 2005, when more than one pak became the norm. It was just loading a standard game and perform a few standard tasks. These things are done by anyone who uses a nightly, and hence never ever caught a bug. (And thus this was not maintained at all when I took over, so it was removed.)

Simutrans has also never strongly enforced a coding style, and automatic refactoring tends to make the code always more unreadable in certain places, where human deviated from the norm for a reason. I tried this several times, but the result was never very nice, and also would break existing patches. (However, apart from GUI patch, we lost most of them anyway recently).

But I think we agreed that careful modernisation is a good thing. So maybe rather start the coding then ;)

Ters

I think C++ was designed to add no overhead compared to C unless you explicitly invoked it. Otherwise, it would not be as attractive in a time where CPU cycles and bytes of RAM counted more than today. That is why pointers are just as dangerous. It is also made to interface directly with legacy C code, which could explain why it has pointers at all. As a safer alternative, C++ has references as a safer alternative to pointers, although they are not perfect replacements as already explained.

I'm not sure why safe_int isn't part of the standard, but it certainly is made so that it is possible to create such a thing that integrates nicely with the rest of the code and can be used just like an int.

It is also a good question why accessing unassigned variables at all is legal, although it is probably due to C compatibility. I don't know how hard it is for compilers to detect such things, or remember when they even started giving warnings. Many aspects of C, such as headers, are to simplify compilers.

C++ exceptions seem almost fundamentally broken, or at least only half-done. Their main problem is that they can't pass through code not compiled according to the same C++ ABI, such as pure C code and modules compiled with other C++ compilers.

dodomorandi

Quote from: Ters on October 31, 2018, 06:09:27 PMC++ exceptions seem almost fundamentally broken, or at least only half-done. Their main problem is that they can't pass through code not compiled according to the same C++ ABI, such as pure C code and modules compiled with other C++ compilers.
C++ ABI is (unfortunately) unspecified by the standard. It is not a matter of exceptions, it is the whole thing :( In x86_64 the Itanium ABI is a de-facto standard on Linux, but MSVC uses its own ABI...

Ters

Quote from: dodomorandi on October 31, 2018, 10:21:49 PM
C++ ABI is (unfortunately) unspecified by the standard. It is not a matter of exceptions, it is the whole thing :( In x86_64 the Itanium ABI is a de-facto standard on Linux, but MSVC uses its own ABI...
Yes, but code flow in general can be routed through the platform's "C" ABI. When it can't, at least directly, you (mostly) notice at compile time or link time. Exceptions just explode at runtime, perhaps silently if you're unlucky.

DrSuperGood

Many of the newer standard C++ features were actually designed to reduce overhead. For example the loosening of constant expressions allows the potential for more compile time optimizations.

If one profiles Simutrans most of the time spent by the game running will be spent on a few areas of the code. You could increase the running time of the rest of the code by 10 or even 100 times and it will make unnoticeable difference to performance. All one must do is make sure that any changes made do not degrade the performance where it matters.

TrainMith

Please do not suggest putting "using namespace std;", even as a quick fix.  Such a statement is only used for extremely quick feature R&D for a small program, or for beginners in the first few programs during their C++ language learning.  AVOID IT.
The proper way would be to insert "using std::whatever_it_is;" which would allow that name and only that name to be brought in from the standard. 

Worth mentioning is that "volatile" was introduced as the glue between such things as interrupt handlers and programs, and before threading was ever really introduced.  As you might well be aware, the processor would switch to the handler, change the contents of a memory location, and return back to the program which would need the updated value.  Hence, a compiler should _not_ be ignoring the volatile keyword, but instead forced to retrieve the current value of the memory location from actual memory.  Unless of course the current standard wording is tolerating the possibility of the updated value being retrievable from cache with a separate copy on write to the actual memory. 

It's quite interesting that while there is a complaint about using standard allocation, ie. new/delete, in some parts of the code, that at no time was an customized allocator mentioned. Granted, originally C++98 allocators were not allowed to have state, but that was rectified in C++11.  Furthermore, C++17 now has other allocators under the std::pmr namespace, with some controversy still.  My hope is that allocators get ironed out totally by C++2000.
Even with all of this, custom allocators can be made and used to help alleviate some of the issues we might have with new/delete in those particular sections of code in question, with more desirable allocation schemes.

It was mentioned about the simutrans customized singly-linked list, which it might be worth pointing out that C++11 introduced the std::forward_list for just such reasons of memory overhead.  Also introduced was std::array.  The hashing containers likewise previously mentioned are std::unordered_map and unordered_multimap.  Don't forget the std::set, multiset, unordered_set, and unordered_multiset containers.  I personally am reveling in the ability to have a std::vector<std::unique_ptr<My_Type>> that allows polymorphism and safer memory guarantees.  Extracting an object of My_Type does require a std::move, for those expecting easy copying. 

I'll have to dig up a few CppCon presentations on Youtube, several of which are showing how terse the resulting machine code can be and accompanying improved performance. 
prissi, I think you are right that the standard is for typical situations, and that simutrans sometimes has simutrans-specific situations that can't use standard constructs.  However, we won't be totally sure of what benefits could be gained or lost without trying.  It seems to me that we are tossing around a lot of code in this posting, which might be better to just toss it in some sections already.  I do think you are correct that the core, underlaying portion of karte_t or such should be possibly overhauled.  It might make maintenance a lot easier sooner than we expect.

Two things bother me about the current simutrans code. 
The first is the profuse copying of forward class declarations.  Placing most of these within small header files would seem to be more beneficial.  The increased number of header files might initially seem like a nuisance, but the code maintenance would be increased as well as seeing the actual overall code structure quicker. 
The second is quite a few class inheritances seem to be higher than need be.  The base convoy class really shouldn't know about the overtaking class.  Granted, this would require each child class of convoy to separately inherit from overtaking, but this would seem to make better sense, especially since an aircraft wouldn't really know about overtaking.  In fact, aircraft might benefit from some other strategy of changing routing, totally unrelated to how the cars would overtake, or perhaps how trains might do so in a home-brewed compile. 

prissi

I think you are touching very interesting points. However, the main thing is that Simutrans was mostly made before C++98, before even std:: was convceived. At that time as much inheritance as possible was also considered clever style. Hence the strict separation of class implementation and definition (with the exception of oneliners).

Things were added over time, and fixed as needed. std came, and grew, and the style coding changed (maybe even faster than the contributors ... ) So we have a relatively bug free Simutrans, and hence maintenance of Simutrans is not very high. Of course with outdated coding style.

Of course one could change Simutrans into a way to comply with the current style of C++ implementation. The motivation for that is easily maintainable code. However, what about stable code, which almost needs no maintenance any more? If changing the code style will bring in many new developers and new features, that will be a different thing. But my feeling is that Simutrans (as well as OpenTTD btw) has been past their peak and now sit comfortable in their niche with some other commercial counterparts.

Not only C++ standard have changed, the world too. Simutrans is ancient, it was top notch when there were 486 CPUs and 4 MB DOS main memory still nothing special, and 1024x768 has high resolution graphics. Multitasking was something only Lunix could do ... This is like putting a steam engine from a railroad on the street by adding rubber tires to it. It can drive and over time the speed limits and steering got upgraded as well. But it is still a steam engine made for tracks; so this is how Simutrans fits to modern massive threaded GPU driven environments. You can drive it, but ...

Realistically Simutrans has reached its limits, without major restructuring, as the 3D patch branch showed. Nowadays one would transfer all the drawing to a GPU and would use more threads, which both would require quite a different basic design. If done well, one could get something very close to Machinsky https://store.steampowered.com/app/598960/Mashinky/

So should effort on changing coding style to do the same thing with different compiler input not rather invested more wisely in doing something not yet possible? The future might be rewriting the core from scratch to optimise for GPU and threads and reuse parts from current Simutrans (like the routing) and the right graphics. Then all the "modern" concepts can be used to their fullest potential.