News:

Simutrans.com Portal
Our Simutrans site. You can find everything about Simutrans from here.

Factory input in-transit corruption (RC120 to nightly)

Started by DrSuperGood, September 01, 2014, 05:11:24 AM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

DrSuperGood

A while back someone pointed out that in-transit numbers were being messed up in the graph display window showing stupidly large numbers. It turns out that it was not just the graph display being messed up, but the actual in-transit number itself.

The reason it appears to work correctly despite the bug is that the statistics are logged using 64 bit signed integers while the factory logic uses only a 32 bit signed interpretation of that. All the garbage is placed in the high 32 bits which is why the graph gets stupidly large numbers yet the factory still functions normally.

What is the cause? Line 265 simfab.cc

w.book_stat(add ? ware->menge : -ware->menge, FAB_GOODS_TRANSIT );

What can possibly be wrong with this line? ware->menge is a bitfield of type uint32 which uses 23 bits. This is translated to a uint32 (32 bit unsigned) for processing. Trying to get the negative of a 32 bit unsigned is going to give you a very large positive number. The reason it still worked is because of how two compliment signed numbers behave. The resulting large number overflowed the lower 32 bits producing the correct result (as far as the factory was concerned) while inserting the overflow into the high 32 bits (breaking the graph).

The solution is simply to change it to...

w.book_stat(add ? (sint64)ware->menge : (sint64)(-(sint32)ware->menge), FAB_GOODS_TRANSIT );

It is not sufficient to only change the problematic statement as the selection statement will not have a clear type (if left statement is used for type it will cause the bug again). Since it is converted anyway there is no harm explicitly type casting the left statement to sint64 and should have no overhead.

The reason for the strange typecast behaviour with the right (negative) statement is to try and let it execute optimally on 32 bit platforms. Performing the negative operation on a 32 bit type (single word) would be faster than on a 64 bit type (two words). Otherwise simply "-(sint64)ware->menge" would suffice where it explicitly converts to 64 bits and then negates it (a 64 bit operation). Maybe the compiler is smart enough to out of order execute the type cast knowing it will generate the same numeric value, who knows, but from a logical point of view this would be simpler as it avoids operations on 64 bit types that are unnecessary.

I tested this bug reproducibly on one of fifty's old saves. A steel mill with coal would suffer it exactly at the same time after every load of the save. After applying above fix I can confirm that after fast forwarding several months the in-transit value graph made sense and was not broken.

For people who have been playing recent builds you must be aware that your in-transit history will probably be broken for 11 months from the time you load a nightly or release with this fix applied. The in-transit counter is rebuilt on load so will load correct and hopefully remain so. Any corrupted in-transit values in your factory history will also be loaded and remain there until they time laps.

Oh and the blame for this is on Dwachs commit  b925516cc8982a48638534704bd7df16dd6874fe on the 07/05/2014 where he removed a "transit" variable from the class which the method used before.
QuoteCODE: remove member ware_production_t::transit to avoid double book-keeping
So basically it will have affected builds from the 7th of May onwards.

Dwachs

Parsley, sage, rosemary, and maggikraut.

DrSuperGood

#2
Just for future reference from MSDN they explain how visual C++ evaluates the type of the expression.

Your solution is subjected to the following rules.
Quote
•If either operand is of type unsigned long, the other operand is converted to type unsigned long.
•If preceding condition not met, and if either operand is of type long and the other of type unsigned int, both operands are converted to type unsigned long.
This means it is performing an implicit sint64 -> unit64 -> sint64 cast which should hopefully produce the correct results (re-interpret casts, should be completely free) however it is very messy and ambiguous that it occurs.  I would advise in future explicitly defining the type of the ternary operator statement by assuring both sides are of the same type when possible.

Dwachs

Parsley, sage, rosemary, and maggikraut.