News:

Simutrans Chat Room
Where cool people of Simutrans can meet up.

ptrhashtable failure

Started by Dwachs, February 08, 2016, 03:33:15 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Dwachs

Here is an interesting bug report in the German forum:

http://simutrans-forum.de/forum/index.php?page=Thread&threadID=9146

There is a link to a savegame for pak64.German. Sooner or later the game will crash with an assert in simvehicle.cc:1003. I managed to reproduce the crash with a 32bit build on my 64bit linux pc. The 64bit version did not crash.

I tracked down the error to a misbehavior in ptrhashtable that is used in marker_t to mark non-ground tiles as visited. In the hash-table routines, the method hash::comp is called, which is essentially a subtraction of pointers. The strange thing what happens is: (snippet from gdb session)

(gdb) print hash_t::comp(node.key, key)
$94 = 1431619927
(gdb) print hash_t::comp(key, node.key)
$95 = 35838

The comp method is defined as:

typedef ptrdiff_t diff_type;
[...]
static diff_type comp(key_t key1, key_t key2) // key_t == grund_t*
{
return key1 - key2;
}

The bad thing here is that both 'differences' node.key-key and key-node.key are positive. Thus ruining the logic of the hashtable. The c++ standard says that taking differences of pointers is undefined, if the pointers do not point into the same array, which is the case here...

How to fix this without breaking something else?

Parsley, sage, rosemary, and maggikraut.

Ters

#1
Although the behavior is undefined, it is hard to see that subtracting two pointers will in itself cause this oddity on x86. The C++ standard is probably restrictive on making promises because of more "exotic" platforms that don't have a flat memory model. (Actually, x86 can be one of them, but not in the operating mode Simutrans runs.)

Update:
Curiously, 1431619927 + 35838 = 1431655765. Not so special? Well, that's 55555555 in hexadecimal, or alternating ones and zeros in binary.

Update 2:
The difference between two pointers is not in bytes, but number of elements, and from the assembly, it looks like the compiler is doing something clever when performing the division to avoid actually performing a division. This might possibly fail if there isn't an integer number of elements between the pointers. Since this function appears to only be concerned about some relative difference, simply casting the pointers to intptr_t before doing the subtraction might do the trick. Probably even change diff_type to that as well.

TurfIt

5555555 is interesting/scary.... but, what were the actual values of key and node.key?

I think the code pre r7217 was more correct for ptrhashtables - casting pointers to plain integers before math so the compiler doesn't get any ideas to treat the pointer as anything but a simple number. But, int and long, aka sint32 are too small on 32bit platforms - pointers could be more than 2GB apart, so need a 64bit signed for the difference. For 64bit platform - sint128? or need a custom diff_type to handle.... or a custom memory allocator that ensures the all objects subjected to this pointer subtraction are allocated within range.

Ters

Quote from: TurfIt on February 08, 2016, 08:40:26 PM
But, int and long, aka sint32 are too small on 32bit platforms - pointers could be more than 2GB apart

I don't think the algorithm here really cares which pointer is strictly first. Overflow with wrap-around would affect both a - b and b - a simultaneously, so that one is always negative and the other positive, wouldn't it? And if ptrdiff_t isn't good enough for storing the difference between all pointers, then what is it good for?

prissi

ptrdiff_t is in principle only guaranteed to work in the same array. Maybe this is an effect of valgrind, which puts stuff in different pages, which could in principle have the same page offset (as far as I know).

On the other hand intptr_t should hold any ptr as an int, no matter if 64 or 32 bit. And it is signed, in the right size, so it looks like a good candiate. (It is optional, but I think any implementation of C99 has it now; because int8_t and the like are optional too.)

isidoro

Quote from: Ters on February 08, 2016, 04:35:28 PM
Update 2:
The difference between two pointers is not in bytes, but number of elements, and from the assembly, it looks like the compiler is doing something clever when performing the division to avoid actually performing a division. This might possibly fail if there isn't an integer number of elements between the pointers. Since this function appears to only be concerned about some relative difference, simply casting the pointers to intptr_t before doing the subtraction might do the trick. Probably even change diff_type to that as well.

I think that that is most probably the reason.

       
  • If you multiply 1431619927*3=4294859781=0xFFFE5C05
  • If you multiply 35838*3=107514=0x1A3FA
  • But 0xFFFE5C05 is just -0x1A3FB
That implies that the difference between the two pointers perhaps isn't integral and there's an offset of 1/3, 1/6, etc. of sizeof(key).

It would be very nice if Dwachs could tell us what were the real values of key, node.key, and sizeof(key).  I'd bet that (((float)key)-(float)node.key)/sizeof(key) is positive and the decimal part has something to do with 1/3, 1/6, or the like...

DrSuperGood

The best would be to type cast it to intptr_t and then to the mathematics. Failing that std::size_t could be used for a similar purpose although it may fail on Harvard architecture systems. If it works or not depends on what the hashtable is doing with the key, however it should allow you to get negative numbers in a way that is reasonably platform safe.

If that fails you will need some other more deterministic key source. For example a unique identifier for the objects involved.

Ters

Quote from: prissi on February 09, 2016, 12:02:09 AM
ptrdiff_t is in principle only guaranteed to work in the same array. Maybe this is an effect of valgrind, which puts stuff in different pages, which could in principle have the same page offset (as far as I know).

Yes, but since there is in principle nothing ruling out that you are dealing with a 3.5 GB char array, the data type is probably not the issue. The best reason for avoiding it is simply its name. I don't think valgrind affects anything here. It must adhere to the operating principles of x86, since that what the code it is running expects. Unless you are suggesting that valgrind masks this bug by putting allocations at less random locations, but I'm not sure valgrind would have helped reveal this bug.

Quote from: prissi on February 09, 2016, 12:02:09 AM
(It is optional, but I think any implementation of C99 has it now; because int8_t and the like are optional too.)

I think it is optional because treating a pointer as a simple integer value might be meaningless/ambiguous on some architectures. x86 is actually one of those, but all popular 32-bit x86 operating systems set up a flat memory model (in user space at least), and no (popular) C compiler expose the I/O address space, so that makes it useable.

Dwachs

It has nothing to do with valgrind, I was not running this in valgrind. I think, the problem is indeed due to some divisibility problems you mention above. Here is more data from a gdb session:

(gdb) print node.key
$109 = (const grund_t *) 0xbf74fc8
(gdb) print key
$110 = (const grund_t * const) 0xc046f9c
(gdb) print hash_t::comp(node.key, key)
$111 = 1431619927
(gdb) print hash_t::comp(key, node.key)
$112 = 35838
(gdb) print sizeof(grund_t)
$113 = 24
(gdb) print 0xc046f9c - 0xbf74fc8
$114 = 860116
(gdb) print 860116/24.0
$115 = 35838.166666666664

Now how to fix it without breaking it silently for other systems?
Parsley, sage, rosemary, and maggikraut.

DrSuperGood

You could assign the key objects a deterministic unique identifier and use that as a key instead. Removes all ambiguity at possibly a small cost of performance. This is probably the best solution as it removes all ambiguity across platforms.

One could also change the key type to intptr_t, if available, which should work on most memory models where application address space is a single bank. You could then do comparisons in whatever unit the pointers are in. For bytes it should work perfectly, it might work for other pointer units on strange platforms but I am not sure. The size_t types will also work to some extent on most platforms and is more readily available however intptr_t is more correct.

Dwachs

Using a unique identifier will not work for some uses of the ptrhashtable.

The following code changes make the hash-table work for me:

static diff_type comp(key_t key1, key_t key2)
{
// return key1 - key2; // wont work
return (const char*)key1 - (const char*)key2; //works
// return (ptrdiff_t)key1 - (ptrdiff_t)key2; //works
}

Comments anyone?
Parsley, sage, rosemary, and maggikraut.

isidoro

I think that that is at least not worse than plain key1-key2.

The function is as far as I have seen only used to keep the elements inside each of the bags of hash tables ordered so that searching is even faster than looking for one element after the other till reaching the end of the bag.

I guess that (int const *)key 1-(int const *)key2 should just work as well, since int should be aligned to the word of the processor.

Other option that I can think of (key1>(void *)key2) - (key2>(void *)key1).

TurfIt

#12
From what I can find, (char*)key1-(char*)key2 is the best solution. However, diff_type must also be changed from ptrdiff_t to sint64.

As has been discovered, arithmetic on ptrdiff_t is only valid for offsets within an array, where the objects are all aligned wrt each other. Also, on MinGW32, ptrdiff_t is a sint32. Too small for this purpose due to overflow -> undefined behaviour per the standard. ptrdiff_t seems to have more undefined behaviours than defined!

For intptr_t, everything I can find shows any arithmetic on these is undefined. Everything also says to use this instead of casting to a int/long/etc., but if then you can't do arithmetic, what's the point?  In practice, it's also just a sint32, so I'm sure would work, but officially arithmetic is undefined.

diff_type needs to be sint64 to avoid a possible key collision - not hash collision, but two different keys comparing equal. I think people are missing the fact hash_t::comp() is used for retrieving the objects, not just ordering.

typename hash_t::diff_type diff = hash_t::comp(node.key, key);
if(  diff == 0  ) {
return node.value;
}

If a sint32 diff_type overflows with wrap around, you could end up with diff==0, but node.key != key, and hence return the wrong item!
EDIT: mistyped/thought this - the same code is used for insertions too, so the hashtable would erroneously refuse to insert the item claiming it's already there.

Using the pointer directly as the hash is also rather bad. Simutrans hash tables already suffer from insufficient number of buckets (hard coded 101 bags), with pointers to ints, the lowest 2 bits will always be 0 when they're aligned. Hence 1/4 of the buckets will never be used. Possibly even worse for larger objects. StackOverflow suggests: template<typename Tval>
struct MyTemplatePointerHash1 {
    size_t operator()(const Tval* val) const {
        static const size_t shift = (size_t)log2(1 + sizeof(Tval));
        return (size_t)(val) >> shift;
    }
};

prissi

As mentioned on stackoverflow too, size_t can be smaller than intptr_t (was actually common in the days of MSDOS). intptr_t would have helped, but did not exist. Hence I am for casting to char *, which will always give the right size as the next working thing.

But it is very interesting, that this error never surfaced so far.

DrSuperGood

Quote
For intptr_t, everything I can find shows any arithmetic on these is undefined. Everything also says to use this instead of casting to a int/long/etc., but if then you can't do arithmetic, what's the point?  In practice, it's also just a sint32, so I'm sure would work, but officially arithmetic is undefined.
Arithmetic on them is defined as it is a signed integer. It will act just like any other signed integer when added. Just the pointer meaning of the result is not defined (may not be valid or meaningful pointer).

In practice its size may vary depending on build, as per its definition.
Quote
The following type designates a signed integer type with the property that any valid pointer to void can be converted to this type, then converted back to a pointer to void, and the result will compare equal to the original pointer: intptr_t
One should be able to convert any pointer to type into a pointer to void. As such it will accept any pointer and produce a value.

Quote
As mentioned on stackoverflow too, size_t can be smaller than intptr_t (was actually common in the days of MSDOS).
That is because size_t can represent the maximum possible type/array size which can be produced, and not the position of an object in memory. Especially the case with some microcontrollers which have larger ROM address space than RAM, or that use multiple memory banks.

Ters

Quote from: TurfIt on February 10, 2016, 12:47:25 AM
From what I can find, (char*)key1-(char*)key2 is the best solution. However, diff_type must also be changed from ptrdiff_t to sint64.

As has been discovered, arithmetic on ptrdiff_t is only valid for offsets within an array, where the objects are all aligned wrt each other.

But ptrdiff_t must, on x86, be big enough to work on a near 4 GB array of chars, which should be legal even if rare. And since the result of pointer subtraction is defined to be a ptrdiff_t (or rather ptrdiff_t is defined to be the type returned by a pointer subtraction), changing diff_type to sint64 won't help, because that just means casting a ptrdiff_t to sint64 afterwards. You can cast the pointers to sint64 first, but casting pointers to integers is exactly the thing intptr_t is for.

Also, the algorithm doesn't care if comparison between the pointers 0x00000000 and 0xFFFFFFFF equates to a comparison between 0 and 4294967295 or a comparison between 0 and -1, as long as it is a comparison between two integer values which uniquely maps to the respective pointers. And a unique integer representation of a pointer is exactly what intptr_t is.

Should we encounter platforms lacking intptr_t, we can reevaluate the problem then (there will likely be other problems elsewhere as well). For x86, we can define our own intptr_t like Microsoft has done with its LONG_PTR type for at least two decades.

When something is undefined in C, it is because it is architecture dependent, C being a very thin abstraction above machine code. The behavior of the architecture is well defined, but the code written based on that is of course no longer platform independent.

Quote from: prissi on February 10, 2016, 01:17:23 AM
But it is very interesting, that this error never surfaced so far.

That it was introduced less than two years ago explains part of it. It also only happens if there is a hash collision. And the result of the bug triggering is likely a false negative, which might not cause too much trouble in most cases.

Dwachs

Here is a patch: it uses intptr_t if available, and falls back to const char* otherwise. Could you please test if this compiles? In particular, if it compiles under MSVC.
Parsley, sage, rosemary, and maggikraut.

Ters

You can't do #ifdef intptr_t. It's not a macro (pre-processor symbol). It will never happen. There are preprocessor symbols for detecting the presence of intptr_t, but they are implementation specific.

Dwachs

Then I would change this ifdef to some

#ifdef USE_UINTPTR_T

which is defined per default, and can be undefined in case of compile error. Would this be a practical approach?
Parsley, sage, rosemary, and maggikraut.

DrSuperGood

Quote
which is defined per default, and can be undefined in case of compile error. Would this be a practical approach?
It is probably better to do the opposite approach and have it used by default unless a macro is declared explicitly telling it not to be used.

NO_INTPTR

This will mean that for common builds fewer macros will need to be passed to every compilation unit (cleaner). Windows, Linux and Mac should all have access to the type I would imagine, despite it being optional.

Ters

I don't think there is any platform Simutrans targets or can be foreseen to target, that won't have this type. Although for Haiku, I base this only on the source code for a/the Java runtime for Haiku using intptr_t since 2010. From a simple Google search result, I can't tell if they've defined the type themselves, though. According to MSDN, MSVC has had intptr_t since 2003 at least, although it appears they may have it in a different header file prior to 2015 (stddef.h, where size_t and ptrdiff_t is supposed to live, rather than stdint.h). Which means that while intptr_t seems safe in itself, #include <stdint.h> does not...

Dwachs

committed in r7761. please report any compile failures.
Parsley, sage, rosemary, and maggikraut.

DrSuperGood

MSVC 2015 builds it without errors. Not sure about older versions though.

Ters

Both mingw and 32-bit mingw64 compiles fine, and looking at the assembly, the shift and multiplication is gone, leaving only a simple subtraction. But it is MSVC older than 2015 that are most likely to cause problems.