Modern C++?

Ters · November 26, 2018, 07:18:35 PM

Quote from: TrainMith on November 26, 2018, 07:14:25 AMPlease do not suggest putting "using namespace std;", even as a quick fix. Such a statement is only used for extremely quick feature R&D for a small program, or for beginners in the first few programs during their C++ language learning. AVOID IT.
The proper way would be to insert "using std::whatever_it_is;" which would allow that name and only that name to be brought in from the standard.

I totally agree. It defeats the point of namespaces. (One could perhaps argue that the standard C++ library should have been divided into more namespaces, which appears to be what other languages have done, but that is beyond our control.)

Quote from: TrainMith on November 26, 2018, 07:14:25 AMIt's quite interesting that while there is a complaint about using standard allocation, ie. new/delete, in some parts of the code, that at no time was an customized allocator mentioned.

I think it has been mentioned in earlier discussions that were very focused on Simutrans' custom collections. Possibly even by me. I don't know more about them than that they exist, though.

Quote from: TrainMith on November 26, 2018, 07:14:25 AMI personally am reveling in the ability to have a std::vector<std::unique_ptr<My_Type>> that allows polymorphism and safer memory guarantees.

It is certainly better than std::vector<My_Type*> (as long as this collection owns the pointers and the pointers have a limited life span), but both have the drawback that you have to de-reference two pointers to get that actual data, which may be significant in performance critical situations.

Quote from: TrainMith on November 26, 2018, 07:14:25 AMGranted, this would require each child class of convoy to separately inherit from overtaking, but this would seem to make better sense, especially since an aircraft wouldn't really know about overtaking.

Composition might be even better, although I'm not sure if C++'s multiple inheritance makes that less of an issue than the Java I'm used to.

Quote from: prissi on November 26, 2018, 03:23:56 PMSimutrans is ancient, it was top notch when there were 486 CPUs and 4 MB DOS main memory still nothing special, and 1024x768 has high resolution graphics.

I don't think I've even played Transport Tycoon on such a poorly equipped computer! But then I've gotten the impression that the Simutrans crowd lags somewhat behind other gamers.

All in all, I think the biggest problem Simutrans has is the combination of game logic and display logic in karte_t. It makes it pretty much impossible to add GPU rendering as an alternative back-end, as the information need for that is quite different from the current algorithm that always draws everything to the backbuffer and only then figures out, in display coordinates, where the changes are. And I suspect it is a drawback for multiplayer as well.

I don't see a way to change that gradually, though.

Direct2D has an API that looks more suited for what Simutrans does than OpenGL (or Direct3D for that matter), but I haven't gotten around to checking if it has some tricks to avoid the serious performance problems the OpenGL attempts have had. It is also Windows only, which makes it significantly less interesting for a cross-platform game.

TrainMith · December 05, 2018, 05:13:48 PM

Quote from: Ters on November 26, 2018, 07:18:35 PM(One could perhaps argue that the standard C++ library should have been divided into more namespaces, which appears to be what other languages have done, but that is beyond our control.)

Yes, but that is why the "using" construct was made; only bring a name into the current namespace if necessary, and as a preventative of multiple full specificiers (ie. std::vector this, std::vector that).

Quote from: Ters on November 26, 2018, 07:18:35 PMComposition might be even better, although I'm not sure if C++'s multiple inheritance makes that less of an issue than the Java I'm used to.

As with any good tool there is a wrong way, and also the wrong time, to use the proper tool. Although I've stayed away from Java, the analog to Java's interface is the pure virtual C++ base class, which has the slight downside of a vtable indirection, but the benefit in being able to use it with multiple inheritance to present different facets for different uses of the derived class with less entanglement.

I'm constantly vexed by simutran's use of forwarding declarations within files that merely use the class, whenever I try to use vim's [-Ctrl-i or [-I code keyword search to find where exactly the class is truly forward declared within an interface header and then its accompanying source file. I'm sure that it might have helped originally coding each module, but now it is wasting coder's time to find these and about 60% of the compiler's time trying to match these forwarding declarations. I'm sure this is also a big dissuasion for some new coders.

Quote from: Ters on November 26, 2018, 07:18:35 PMI don't know more about them than that they exist, though.

I've yet to make one myself, mostly because C++98's allocator was poorly designed and documented. Yet they have been improved, and what better time to learn?

Quote from: Ters on November 26, 2018, 07:18:35 PMIt is certainly better than std::vector (as long as this collection owns the pointers and the pointers have a limited life span), but both have the drawback that you have to de-reference two pointers to get that actual data, which may be significant in performance critical situations.

Your fears may be unfounded: C++Now 2018: Matt Godbolt "What Else Has My Compiler Done For Me Lately?" https://www.youtube.com/watch?v=nAbCKa0FzjQCppCon 2017: Matt Godbolt "What Has My Compiler Done for Me Lately? Unbolting the Compiler's Lid" https://www.youtube.com/watch?v=bSkpMdDe4g4Also CppCon 2016: Jason Turner "Rich Code for Tiny Computers: A Simple Commodore 64 Game in C++17" https://www.youtube.com/watch?v=zBkNBP00wJEOne profound observation was that compilers might be better at creating machine code and, when set to higher optimization levels, don't always expect the machine code to exactly follow the source code while also providing better instructions.

Interesting point to make: if the dirty bit is being used to mark a graphic to be redrawn, it is being done for two reasons: once for the object needing to move or be placed/removed, and otherwise for redrawing itself because something in front of it moved. Thus, I'd suggest having the dirty bit marked in two places; first for the object actually being moved/placed/removed (ie, OpenGL happy, perhaps also for marking a Display List, VBO, or VAO as needing to be remade), and the other for the graphics in back to redisplay (ie, not needing work if using OpenGL).

Ters · December 05, 2018, 06:36:41 PM

Quote from: TrainMith on December 05, 2018, 05:13:48 PMYes, but that is why the "using" construct was made; only bring a name into the current namespace if necessary, and as a preventative of multiple full specificiers (ie. std::vector this, std::vector that).

"using" is not the solution. It is equally the problem. What I'm talking about is that "using" can either import a single symbol, or the entire standard C++ library. Other languages, like Java and Python, have divided their standard library into multiple parts so that you can import for instance everything related to file access or GUI with a single line. In C++, the fact that you also need to use #include allows you to filter down somewhat, but since included files often include other files, you get more than you implicitly ask for.

On the other hand, there is debate on whether importing symbols in bulk is a good idea in Java, even though the bulks are far smaller than in C++.

Quote from: TrainMith on December 05, 2018, 05:13:48 PMYour fears may be unfounded

Compilers can't work around language rules and physical realities. Although I haven't read the C++ standard itself, it is apparently not freely available, it is my impression from other sources that C++ dictates that std::vector<int*>, and by extension std::vector<std::unique_ptr<int>>, basically boils down to int**. Getting hold of the actual int value requires two memory accesses. std::vector<int> boils down to int*, where only one memory access is needed to get the int value. std::vector<std::unique_ptr<int>> is of course a strange thing to have, but the same holds true for other types than int.

There are some potential benefits of having a vector of some kind of reference to something, rather than just a vector of something. What ultimately is best depends on the size of that something, the access patterns, and whether execution speed is more important than development speed.

TrainMith · December 07, 2018, 02:09:55 PM

Apparently the forum has a problem with code blocks, and decided to delete all of my commentary. My apologies for the original seemingly empty post. Upon further forum testing, code blocks have problems with line returns and some other anomaly. I also had to add an extra line return after every each code line. Anyways, back to my commentary!

Quote from: Ters on December 05, 2018, 06:36:41 PMit is my impression from other sources that C++ dictates that std::vector, and by extension std::vector

Intuitively I had thought the same, although was more optimistic due to both the gcc and clang compilers, which optimize the following code back into an int array access:

Code Select

#include <vector>#include <memory>
int check_it(const std::vector<std::unique_ptr<int>> &vupi)
{
int result(0);
for(size_t i(0); i < vupi.size(); ++i)
result += *vupi[i];
return result;
}

MS VS2017 apparently isn't as clever, and does the expected double indirection. This apparently is a specification for int, maybe even for any built-in, type, since the following code results in the double indirection we both were expecting:

Code Select


#include <vector>
#include <memory>
class A
{
    public:
        virtual int get() const = 0;
        virtual ~A() {}
};
class B : public A
{
        int b;
    public:
        B(int bb) : b(bb) {}
        int get() { return b; }
};
class C : public A
{
        int c;
    public:
        C(int cc) : c(cc) {}
        int get() { return c++; }
};
int check_it(const std::vector<std::unique_ptr<A>> &vupi)
{
    int result(0);
    for(size_t i(0); i < vupi.size(); ++i)
        result += vupi[i]->get();
    return result;
}

Tested this at cpp.godbolt.com by the methods mentioned within the videos I linked earlier.

Ters · December 07, 2018, 05:12:02 PM

I don't see how std::vector<std::unique_ptr<int>> can be turned into int[] without breaking certain things. In particular:

Code Select


int *a = vec[1].get();
vec.erase(vec.begin());
int *b = vec[0].get();
assert(a == b);

Only if the compiler can be certain that it sees all usages of vec can if do such a substitution.

isidoro · December 08, 2018, 12:45:27 AM

Not a good example. Instead of moving the content of the int[], the compiler can make the vec start now at int[1]...

Ters · December 08, 2018, 09:01:53 AM

Quote from: isidoro on December 08, 2018, 12:45:27 AM
Not a good example. Instead of moving the content of the int[], the compiler can make the vec start now at int[1]...

That is std::deque, isn't it? But the point is that insert and remove can cause the elements of the vector to be moved to a different memory address. What if I did an insert at index 0 with a million new elements instead of one remove? Are you suggesting that the compiler can just allocate a million elements ahead of the start just in case? Remember that the compiler will not see all operations being done on the vector, as the vector is accessed from multiple compilation units. You'd need some serious link-time optimizations to improve this.

isidoro · December 08, 2018, 08:51:08 PM

I only said that yours wasn't a good example, not that there wasn't the problem you mention.

The best thing can be to look at the assembly code in compilers that do such an optimization, if there are...

Ters · December 08, 2018, 11:22:00 PM

The best thing to do is pick good algorithms and data structures, using the collective experience built up over decades, then measure the performance. Trying to read the optimized assembly output from a compiler will drive most people, even most developers, half crazy. One can try small examples, but there are so many things that come into play, that making examples covering them all is very hard. GCC has some intermediate form that can be dumped for analysis, but it is very verbose, so it is also not really usable on actual production code.

TrainMith · December 09, 2018, 09:01:00 AM

Quote from: Ters on December 07, 2018, 05:12:02 PMI don't see how std::vector<std::unique_ptr<int>> can be turned into int[] without breaking certain things.

Ter, it didn't make int[], rather it was int*[]. The compiler optimized for a single pointer dereference from the vector since there was an iteration over all the elements, and then did a per element dereference from the unique_ptr. Your example wouldn't cause an assertion error, btw, because of C++11 move semantics introduced into the standard containers. Pointing into the vector would still cause a problem due to elements being relocated.
Ter, even the creator of Compiler Explorer mentioned that it is best for analyzing small portions of code, and to see how they relate using different compilers. It even highlights between the original code and the assembly produced, and it has buttons for reducing the verbosity (even name mangling). I'm thinking you'd like it a lot more than you think.

Quote from: prissi on November 26, 2018, 03:23:56 PMSo should effort on changing coding style to do the same thing with different compiler input not rather invested more wisely in doing something not yet possible? The future might be rewriting the core from scratch to optimise for GPU and threads and reuse parts from current Simutrans (like the routing) and the right graphics. Then all the "modern" concepts can be used to their fullest potential.

Quote from: Ters on November 26, 2018, 07:18:35 PMI think the biggest problem Simutrans has is the combination of game logic and display logic in karte_t. It makes it pretty much impossible to add GPU rendering as an alternative back-end

I'm in agreement with Ter here; karte_t is needing to be reworked. Perhaps game_entity_t would be a good idea for the per-game aspect, keeping karte_t for the karte it is supposed to be. Also we could have a lobby_t, allowing us to select the game_entity_t for play.

Ters · December 09, 2018, 09:53:31 AM

Quote from: TrainMith on December 09, 2018, 09:01:00 AM
Ter, it didn't make int[], rather it was int*[]. The compiler optimized for a single pointer dereference from the vector since there was an iteration over all the elements, and then did a per element dereference from the unique_ptr.

You were the one who wrote that the compiler turned it into an int array, when I claimed that it could not do better than an array of pointers to integers, which in turn had to be accessed via a pointer (in all but the most trivial of cases).

isidoro · December 09, 2018, 11:53:58 PM

Quote from: Ters on December 08, 2018, 11:22:00 PM
The best thing to do is pick good algorithms and data structures, using the collective experience built up over decades, then measure the performance. Trying to read the optimized assembly output from a compiler will drive most people, even most developers, half crazy. One can try small examples, but there are so many things that come into play, that making examples covering them all is very hard. GCC has some intermediate form that can be dumped for analysis, but it is very verbose, so it is also not really usable on actual production code.

The best thing to do to solve a mystery about how some thing or other is done by a compiler is to look at the assembly generated by it. It's not that difficult. Give it a try.

Ters · December 10, 2018, 06:53:43 AM

Quote from: isidoro on December 09, 2018, 11:53:58 PMThe best thing to do to solve a mystery about how some thing or other is done by a compiler

I'm not here to solve mysteries, except for why Simutrans is as it is.

Quote from: isidoro on December 09, 2018, 11:53:58 PMGive it a try.

I don't have to try. I've read more than enough x86 assembly, and some Java equivalent, to risk my sanity (if I had any to begin with).

TrainMith · December 10, 2018, 06:12:31 PM

Quote from: Ters on December 09, 2018, 09:53:31 AMYou were the one who wrote that the compiler turned it into an int array, when I claimed that it could not do better than an array of pointers to integers, which in turn had to be accessed via a pointer (in all but the most trivial of cases).

Ter, forgive me for misreading your post to which I was originally replying, as that we must have been thinking of the same array of pointers to integers. I had only been inferring that the vector might turn into an optimized array access (especially in a for-loop), but did not intend that the contained pointer would be simplified.

As for other aspects of modern C++, the usage of constexpr for both constants and functions returning compile-time constants would help some with compile and runtime times.

isidoro · December 11, 2018, 12:15:33 AM

Quote from: Ters on December 10, 2018, 06:53:43 AM
I'm not here to solve mysteries, except for why Simutrans is as it is.

Good for you!

TrainMith · December 14, 2018, 10:08:32 PM

Quote from: prissi on November 26, 2018, 03:23:56 PMOf course one could change Simutrans into a way to comply with the current style of C++ implementation. The motivation for that is easily maintainable code. However, what about stable code, which almost needs no maintenance any more?

One huge motivation should be the ease of attracting, and rapid code familiarity of, new programmers. You yourself (which I seem to recall reading the self-admittance) took close to a full year to understand the simutrans code. Ter, Dwachs, and others took a similar amount of time also, most likely. I myself yearned to begin modifying but felt a huge and rapid frustration, most especially at the over-abundant and spurious forward class declarations; these assuredly increase the compile time, foil any attempts at quickly finding actual definitions using tools such as vi's nomenclature search feature, and ultimately clouding the understanding of the code. We might see some optimizations that would have been missed without that reorganized clarity.
This reorganization of forward declarations I consider a minor and bug-free task, since no function bodies should be touched nor introduced. One could almost say the phrase of "a novice could do it" if it were not for having to hunt for the actual class declaration locations and properly distilling the header file hierarchy. I just checked my local git repository against the origin for any latest changes, so I myself might see how well things go during the holiday here.

Quote from: prissi on November 26, 2018, 03:23:56 PMThe future might be rewriting the core from scratch to optimise for GPU and threads and reuse parts from current Simutrans (like the routing) and the right graphics. Then all the "modern" concepts can be used to their fullest potential.

I'm in total agreement, although let us consider an inheritance situation with the rendering, so that those without a modern GPU could still select this pre-GPU style implemented as a derived class which could, upon selection, copy relevant portions to the other derived class used for GPUs. Best example of this would be the game Homeworld (circa 2000), which could even during the game change between software, OpenGL, and DirectX engines.

Quote from: Ters on November 26, 2018, 07:18:35 PMthe biggest problem Simutrans has is the combination of game logic and display logic in karte_t. It makes it pretty much impossible to add GPU rendering as an alternative back-end, as the information need for that is quite different from the current algorithm that always draws everything to the backbuffer and only then figures out, in display coordinates, where the changes are.

I think there might be a less obvious way, by changing direct rendering to instead be function calls and also two different situations regarding when objects are rendering dirty (one for "object needs update" and another for "object needs repainting"). The first would catch such things as necessary for even OpenGL, where a terrain height gets modified which requires the possibly typically static terrain rendering object needs updating (likewise with city buildings and trees), and the second would catch the situations where the typical software renderer would need to update an image for effectively redrawing pixels in front of movable vehicles or such.

Further back to topic, the Compiler Explorer can be downloaded to locally be run on a machine, so the lack of libraries on the web version wouldn't be a problem for simutrans. The web version, though, could be used for small and concise measurements for optimization.

Ters · December 14, 2018, 10:20:47 PM

Quote from: TrainMith on December 14, 2018, 10:08:32 PMI think there might be a less obvious way, by changing direct rendering to instead be function calls

If you are thinking about replacing alle the draw_img calls with calls to OpenGL, that has already been tried, and it doesn't work. Since Simutrans paints more or less back-to-front, the amount of data being sent to the GPU (state changes and vertex data) exceeded that being sent by just rendering on the CPU, at least once you zoomed out a bit. In order to effectively use the CPU, one would need to build vertex buffers from the mostly static parts of the map (terrain, buildings, roads and trees) and only rebuild them when these things are changed. Only the vehicles would have to be regenerated (more or less) every frame and merged into the static parts using z-buffering (which could eliminate some existing drawing issues).

Unfortunately, there is no easy way to detect when static parts change, as there are so many places that access karte_t internals directly. It is also no longer trivial to put all the images into video memory all at once, since the limit of 64k images has been removed.

DrSuperGood · December 15, 2018, 02:09:55 AM

QuoteIf you are thinking about replacing alle the draw_img calls with calls to OpenGL, that has already been tried, and it doesn't work.

I suspect he is talking about replacing them with graphic engine calls (a component that would need to be written) which internally add graphic commands to queues. Adding to queues is very low overhead as the API calls are user space. After everything is done for a frame it then asynchronously executes the command queues to produce an image. This would add minor display latency, but far less than network latency so will likely be invisible.

The existing dirty area system can still work for accelerated graphics. One must keep a copy of the output before sending it to the display buffer (which might be double or triple buffered). Yes an extra full screen copy is overhead, but something modern GPUs should effortlessly handle.

QuoteIt is also no longer trivial to put all the images into video memory all at once, since the limit of 64k images has been removed.

That could be fixed by using a sensible asset loading system. Instead of loading every asset at game start, it only loads game data with assets being streamed in from files as required. This can have standard caching practices applied to it to keep only recently used images inside GPU memory. Even when not using GPU acceleration, the result will still be a massive reduction in memory usage in most cases, especially large diverse paksets.

Ters · December 15, 2018, 10:53:04 AM

Quote from: DrSuperGood on December 15, 2018, 02:09:55 AM
I suspect he is talking about replacing them with graphic engine calls (a component that would need to be written) which internally add graphic commands to queues. Adding to queues is very low overhead as the API calls are user space. After everything is done for a frame it then asynchronously executes the command queues to produce an image. This would add minor display latency, but far less than network latency so will likely be invisible.

Well, that is what I tried years ago. The problem is that the queue ended up becoming bigger in terms of bytes than the backbuffer once you zoomed out. And the queue has to be flushed if you changed global states, like texture bindings. I avoided having to change texture bindings by the fact that pak64 had small and few enough images to all fit into a single large texture. (And by putting as much state as possible into the vertex data, which made it so big in the first place.) Pak sets with bigger and or more images would simply not fit on my graphics card. A John Carmack could probably pull of some ticks, but we have no such developer.

TrainMith · December 17, 2018, 05:16:36 PM

Quote from: DrSuperGood on December 15, 2018, 02:09:55 AMI suspect he is talking about replacing them with graphic engine calls (a component that would need to be written) which internally add graphic commands to queues.

The calls would be to simutrans functions which might be part of an class inheritance tree most likely associated with a particular rendering engine, which would either handle the work immediately (possibly Immediate Mode calls, obviously specifically for OpenGL, or just doing the few instructions such as simutrans has now but are currently prolifically embedded into the karte_t and elsewhere), possibly immediately executing or as you said queuing information for a delayed building of something like VBO's or simutrans software-only solutions.

Quote from: Ters on December 15, 2018, 10:53:04 AMThe problem is that the queue ended up becoming bigger in terms of bytes than the backbuffer once you zoomed out. And the queue has to be flushed if you changed global states, like texture bindings.

This sounds like the exact issues that games such as flight simulators and such had to contend with, and have several solutions or hacks already available; CLOD, BSP's, etc. I read in another post that you had split the area into 32x32 squares? Also, correct me if I am wrong, but I thought that it was possible to get more textured objects into the scene by drawing some with a certain texture onto the color buffer, changing the textures, and then drawing more objects, so long as the buffer wasn't cleared? I think this came right out of the OpenGL Red Book.

Quote from: Ters on December 15, 2018, 10:53:04 AMWell, that is what I tried years ago.

Ter, it appears that simutrans Forum didn't retain your diff file that you posted, and I'd hate to see your hard efforts go to waste. I think you might have gotten much further along but just hit a roadblock that time wasn't permitting. You wouldn't happen to have the diff, or be able to make a newer against the more current simutrans version?

Ters · December 17, 2018, 06:11:50 PM

Quote from: TrainMith on December 17, 2018, 05:16:36 PMI read in another post that you had split the area into 32x32 squares?

No, that was what I wanted to do for my second attempt, but karte_t was too hard to work with, so I never really got started coding.

Quote from: TrainMith on December 17, 2018, 05:16:36 PMAlso, correct me if I am wrong, but I thought that it was possible to get more textured objects into the scene by drawing some with a certain texture onto the color buffer, changing the textures, and then drawing more objects, so long as the buffer wasn't cleared?

Yes, but as I'm trying to say, Simutrans currently draws things back-to-front (almost). You might therefore get texture changes multiple times per tile! At least now that the hope of being able to put all of Simutrans' images into one texture (unless we reduce the number of images back to 64k for pak64 and even less for pak128). There are solutions, but they are not trivial. You either need to very cleverly put the right images into the right texture, or you need to stop rendering straight from karte_t back-to-front. Replacing the painter's algorithm with something else is not trivial, as there is some cheating in the current graphics.

Quote from: TrainMith on December 17, 2018, 05:16:36 PMTer, it appears that simutrans Forum didn't retain your diff file that you posted, and I'd hate to see your hard efforts go to waste. I think you might have gotten much further along but just hit a roadblock that time wasn't permitting. You wouldn't happen to have the diff, or be able to make a newer against the more current simutrans version?

The first attempt, which was just replacing simgraph16.cc with OpenGL calls, didn't hit a roadblock due to time. The solution was unworkable in principle. Translating the internal rendering calls to OpenGL calls takes about as much time as just drawing straight to the backbuffer using the CPU, and the data involved (verticies and drawing commands) was just as big as the back buffer, if not bigger (frame rate was decimated). This can be proven mathematically.

To get any kind of performance out of it, one needs to keep the verticies in VRAM between frames, and only update them when something actually changes. There is however no easy way of knowing when and where something changes in Simutrans. The only thing reusable from my first, dumb attempt is perhaps the shaders, but they are probably not very well written, nor are they hard to recreate.

News:

Modern C++?