News:

Simutrans Wiki Manual
The official on-line manual for Simutrans. Read and contribute.

Interface unification in besch-classes

Started by Dwachs, September 12, 2013, 07:58:17 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Dwachs

With this patch the interface of the besch classes is unified: a-lot of common properties is moved into base classes: time-line information, waytype/cost/maintenance/ etc.
This reduces code duplication to a large extent. Many of the common properties were copy&pasted anyway from existing ones.
Moreover it would make the exporting of these classes to the scripting language much more easier: if done with the existing code one has to mimic the already existing code duplication: code triplication :P

As the new class structure is still a tree without any virtual methods I expect no performance hit with this patch.

What do you think?
Parsley, sage, rosemary, and maggikraut.

Max-Max

Without having read all of your diff code I would like to comment on virtual functions.

A virtual function usually only taking one extra lookup in a table, a few assembly instructions.
Compare this to how polymorphism handled C style, a switch...case or if...else...if construction. Comparing these two solutions the extra virtual lookup are usually faster than the C style solution. So I wouldn't bee to afraid to use virtual functions, just do it with some common sense...
- My code doesn't have bugs. It develops random features...


prissi

Did I miss any virtual in this patch? If there, virtual functions for something like topspeed of ways could be quite a performance reducer, as those are looked up many times during route searches.

But the patch is without, and indeed reduce code duplication (and will make documentation and understanding easier anyway). So I am with kierongreen.

kierongreen

Dwachs was saying that there aren't any virtual methods in the patch (presumably fearing that could slow things down?)

Max-Max

I didn't say there where any, I just pointed out that virtual functions doesn't automatically means slow code.
This is an old legacy reputation C++ got in the early days. The modern compilers today creates very effective code in this area.
I have designed product platforms in C++ with heavy use of virtual functions on microcontrollers not running faster than 70MHz and with only 20K of RAM.

The real bottle necks of C++ are RTTI and exceptions (throw...catch) and for embedded systems also float and double where we use integer maths instead. I guess doubles are no problem for today's PC with their on board FPU.

That was all I wanted to point out... :police:
- My code doesn't have bugs. It develops random features...

Ters

I guess cache locality can give switches and ifs a small advantage over virtual functions, but that's pure speculation.

prissi

A virtual function involves a load of the program counter from a location (i.e. a jump with depends on an external location). This is (on most processors) similar to a mispredicted branch, because the internal instruction cache can be only filled after the address has be gathered. A normal branch is handled by the logic much better, since the new address is know beforehand.

Max-Max

On a low performance system, such a microcontroller it might maybe have some notable impact if you really look for it. I think today's systems will handle this quite fast. A few nano seconds here or there isn't notable in performance...

As I said, this was a bigger problem in the early beginning of C++ when the most compilers "translated" C++ to C code and then compiled in a C compiler. Most C++ compilers of today can optimize and link to the overloaded function directly. Everything isn't resolved at runtime...
- My code doesn't have bugs. It develops random features...

Ters

Quote from: prissi on September 16, 2013, 08:04:21 PM
A virtual function involves a load of the program counter from a location (i.e. a jump with depends on an external location). This is (on most processors) similar to a mispredicted branch, because the internal instruction cache can be only filled after the address has be gathered. A normal branch is handled by the logic much better, since the new address is know beforehand.

I guess that is true for switches implemented as jump destination lookup tables as well, which I've heard is typical for switches over a narrow set of values.

Quote from: Max-Max on September 16, 2013, 08:45:22 PM
On a low performance system, such a microcontroller it might maybe have some notable impact if you really look for it. I think today's systems will handle this quite fast. A few nano seconds here or there isn't notable in performance...

As I said, this was a bigger problem in the early beginning of C++ when the most compilers "translated" C++ to C code and then compiled in a C compiler. Most C++ compilers of today can optimize and link to the overloaded function directly. Everything isn't resolved at runtime...

However, if it can be resolved at compile time, you didn't need to make the function virtual in the first place, in that particular case. Otherwise, I was about to agree, but then it struck me: For the compiler to use a static function call rather than a dynamic/virtual function call, it must know, beyond doubt, that the object is of a specific type (or types, assuming they all simply inherit the member function is question without overriding it). The compiler can't know that if the object is passed into scope by reference, unless the scope is only accessible from within the same translation unit (the .cc file). For all it knows, the calling translation unit can contain subclasses it doesn't know about.

This is getting nitpicky, though.

Markohs

I think this way of thinking about code optimization is *BAD*.

We as programmers should design our code optimizing our algorithms asymptotic costs on CPU,memory and I/O usage globally, and tip the information we can to the compiler, so the compiler has enough information to re-arange and optimize the resulting assembler code.

Believe me, the compiler designers know way better than we us all about the performance of the architecture we are compiling to, and use tools that change the code so much that we whoudn't recognise it back if we saw the assembler finally generated.

Why thinking if a virtual function can create a memory adressed jump, can't they re-arrange the code so it pre-fetches the adress loading? I think they can, and they do, and more efficiently than we do. Why are you trying to think as assembler programmers? You are suposed to be writing C++, clean, efficient and easy to read/mantain code.

If you think as a assembler programmer, you are probably trying to force an optimization that will trigger or maybe not, in a particular compiler version, that can very probably render slower code in newer compiler versions or other compilers (LLVM is another compiler that will very probably replace gcc in the future, we also have compilers for Mac architectures, and the hated here Visual Studio that produces faster code than GCC).

You don't have a performance problem using virtual functions *until you can prove it*, and probably you are losing more performance in many  other places of the code than here.

Ofc we all know about computers and some assembler, and use some obvious general way of thinking that make some code arrangements obvious, our instinct points us there. But don't overdo it, please.

I like this patch, makes the code easier to read and mantain, and it's the only thing that matters to me, if noone can prove it's slower.

prissi

virtual can be a non-trivial thing. Not always the obvious function is called, and there is the virtual destructor issue. (This is like operator overloading, a matter of taste of course.)

For microcontrollers, virtual function come with almost no punishment (apart from the extra lookup) since they have small caches and no long pipeline. Rather the sophisticated CISC are at a loss.

About compiler optimisation. Lets look at a typical function called very many times during the game (especially at route searching).


// how expensive to go here (for way search)
// author prissi
int automobil_t::get_kosten(const grund_t *gr, const sint32 max_speed, koord from_pos) const
{
// first favor faster ways
const weg_t *w=gr->get_weg(road_wt);
if(!w) {
return 0xFFFF;
}

// max_speed?
sint32 max_tile_speed = w->get_max_speed();

// add cost for going (with maximum speed, cost is 1)
int costs = (max_speed<=max_tile_speed) ? 1 : 4-(3*max_tile_speed)/max_speed;

// assume all traffic is not good ... (otherwise even smoke counts ... )
costs += (w->get_statistics(WAY_STAT_CONVOIS)  >  ( 2 << (welt->get_settings().get_bits_per_month()-16) )  );
...


Assume the speedlimit is not a member of weg_t but is just returned by a virtual function. Obviously, as soon as we load the pointer to w, the compiler would have to load the virtual function table. w is const, so it is allowed to assume it can do this. But w might be zero, so it has to defer it until we anyway access w for the first time. Or it would crash there even though this is completely legal code.

Hence the first time it can access the virtual function table is actually when "w->get_max_speed();"
Depending on the number of virtual functions, it has to fill the first cacheline with the virtual function table if w (maybe 32 bytes) and then again calling the actual function (which is again loading a cacheline, writing to the stack the pc (invalidating a cacheline), jmp, return plus register clobbering). However, without virtual functions, this is inlined code from the header file (which compilers are very good at), so it reads at *(((sint)*w)+20) (which is at most a cacheline). Call this several million times (during route search) and it will make a big different).

inline versus virtual function is something it pays to have a short though over. But as with everything in code: Mathematically and pratical are two very different shoes, unfourtunately.

Markohs

 Yes, but you never know if another part of the function might be partially re-ordered if the dependencies allow for it, or wich implementations to virtual functions might come in the future or different compilers.

As you say, inline functions are way faster, a factor of 4 to 10 if you look at http://www.open-std.org/jtc1/sc22/wg21/docs/TR18015.pdf section 5.3.* (that resource is 5 years old, things might have changed since so).

I'd say, you need to just program using what the language gives to you, and let the compiler do the optimizing work, if you want to make the routine you pasted faster there are even better solutions, faster, looking up speed tables, and batching all that progress better. Basically, not using objects at all. But why whould you do that,? you'd only do that if the performance there is critical, and justifies the extra coding cost.

Max-Max

I think this discussion has really got off topic and puts Dwachs good work in the shadow.

I'm all with Markohs here. We shouldn't let our assumptions on compiler optimization steer how we code. We should use what the language offers. We are using different compilers, versions etc... and they compile and optimize in different ways and produce different assembly.

Let us use what C++ has to offer and steer away from assembly/C style coding. Dwachs initiative to move the code towards a more object oriented design is very much welcomed.

Good work Dwachs :thumbsup:
- My code doesn't have bugs. It develops random features...

kierongreen

Large scale design should use object orientated features. At this scale patches like this which remove code duplication and make files easier to read are great.

At the small scale where performance is critical Assembly/C style coding is necessary to achieve acceptable performance. However most speedups don't come from optimising code but from optimising algorithms. The huge number of duplicate bounds checks I removed from various parts of simworld were an example of this.