News:

Simutrans Wiki Manual
The official on-line manual for Simutrans. Read and contribute.

Possible bug in fix for 64-bit hashtables

Started by jamespetts, May 10, 2014, 06:10:28 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

jamespetts

I am not sure whether this is a bug or not, but it looks a little odd, so I thought it better to post it to be sure: in this commit which is related to fixing a bug with 64-bit hashtables, there is the following code:


template<class inttype> struct int_diff_type {
typedef int diff_type;
};
template<> struct int_diff_type<sint64> {
typedef sint64 diff_type;
};
template<> struct int_diff_type<uint64> {
typedef sint64 diff_type;
};


Both the uint64 and sint64 versions of diff_type use the sint64 hash - is this intended?
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Dwachs

Yes, this type is used for comparisons against zero. Maybe it is not the most clever implementation ...
Parsley, sage, rosemary, and maggikraut.

jamespetts

Are there actually any uint64 hashtables in Standard? If so, what do they do?
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

TurfIt

Sure has the look of a typo... Perhaps a code comment explaining the cleverness would be in order.  ;)

Dwachs

Quote from: jamespetts on May 10, 2014, 08:25:22 PM
Are there actually any uint64 hashtables in Standard? If so, what do they do?
The ground_texts in grund.cc, in experimental this is done with uint32 .
Quote from: TurfIt on May 10, 2014, 08:25:53 PM
Sure has the look of a typo... Perhaps a code comment explaining the cleverness would be in order.  ;)
I am not sure about the cleverness part ...
Parsley, sage, rosemary, and maggikraut.

jamespetts

Quote from: Dwachs on May 11, 2014, 09:29:36 AM
The ground_texts in grund.cc, in experimental this is done with uint32 .

Ahh, I see - what is the advantage of uint64 here?

Edit: Incidentally, is this part of the code safe for 64-bit Windows builds:


/**
* Sometimes we need to pass pointers as well as integers through a
* standardized interface (i.e. via a function pointer). This union is used as
* a helper type to avoid cast operations.  This isn't very clean, but if used
* with care it seems better than using "long" and casting to a pointer type.
* In all cases it ensures that no bits are lost.
* @author Hj. Malthaner
*/
union value_t
{
    value_t()                : p(0)   {}
    value_t(long itg)        : i(itg) {}
    value_t(const void* ptr) : p(ptr) {}

    const void* p;
    long i;
};
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Ters

Quote from: jamespetts on May 11, 2014, 12:38:00 PM
Edit: Incidentally, is this part of the code safe for 64-bit Windows builds:

That particular code isn't in itself unsafe, but it is probably used in ways that aren't safe. But last I checked, Simutrans doesn't support 64-bit, and on Windows, it's pretty pointless anyway. (Windows always has 32-bit compatibility, while it is optional on Linux.)

jamespetts

The reason to use 64-bit clients on Windows is to overcome the memory limit: I know that at least one person is beginning to encounter this issue with a large map in Experimental.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Ters

As you have just found an example of, and there might be more, 64-bit builds on Windows in limited to 32-bit adresse space anyway. The problem is that the OS doesn't know that, so all kinds of things might happen. And it seems that there has been put some effort into hiding these limitations, so they won't be easy to remove.

On top of that, a 64-bit build will consume more memory, and therefore be somewhat slower (although running 32-bit on a 64-bit processor may have some penalties in itself), than 32-bit builds. Several optimizations also don't work on 64-bit, possibly even having an opposite effect. So ironically, one should have smaller maps on a 64-bit build than a 32-bit build.

jamespetts

Quote from: Ters on May 11, 2014, 03:38:46 PM
As you have just found an example of, and there might be more, 64-bit builds on Windows in limited to 32-bit adresse space anyway. The problem is that the OS doesn't know that, so all kinds of things might happen. And it seems that there has been put some effort into hiding these limitations, so they won't be easy to remove.

I should point out intially that it was TurfIt who found this - I thought that it might be worthwhile to mention here. I don't think that I should have been able to spot this. Might it be worthwhile looking into making the code 64-bit compatible? I suspect that that is rather beyond my ability, as my knowledge of low-level matters is very limited.

QuoteOn top of that, a 64-bit build will consume more memory, and therefore be somewhat slower (although running 32-bit on a 64-bit processor may have some penalties in itself), than 32-bit builds. Several optimizations also don't work on 64-bit, possibly even having an opposite effect. So ironically, one should have smaller maps on a 64-bit build than a 32-bit build.

May I ask which optimisations do not work on 64-bit? I should note that I have not found 64-bit builds to be noticeably slower than 32-bit builds, at least on Windows. Also, Experimental uses the sint64 integer type in a number of places where Standard does not; am I correct in understanding that a 64-bit build will process these values in fewer processor cycles, and therefore faster, than a 32-bit build on a 64-bit machine?
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Ters

Quote from: jamespetts on May 11, 2014, 06:32:32 PM
May I ask which optimisations do not work on 64-bit? I should note that I have not found 64-bit builds to be noticeably slower than 32-bit builds, at least on Windows. Also, Experimental uses the sint64 integer type in a number of places where Standard does not; am I correct in understanding that a 64-bit build will process these values in fewer processor cycles, and therefore faster, than a 32-bit build on a 64-bit machine?

First of all, the hand optimized assembly will not be used. Prissi claims this is critical to good performance, but unless this is for running Simutrans on older CPUs (which aren't 64-bit anyway), I'm not really convinced.

The second issue, which is more convincing, is that the size of data structures change. Pointers will be twice as big, as will potentially longs.

A 32-bit build targetting 64-bit processors can use 64-bit instructions at a slight penalty (if I remember correctly, otherwise there is no penalty). But I have no idea how that penalty compares to doing the 64-bit operation as two chained 32-bit operations. Targetting only 64-bit processors will however exclude some players, as we noticed when the nightlies by accident no longer supported AMD K6. (I wonder how the IPv6 requirement hits players like that. The lack of Windows support from Microsoft is probably more serious though, although they may not think so themselves.)

jamespetts

May I ask - has anyone demonstrated differences in speed between 32-bit and 64-bit builds by profiling?

How does one compile a 32-bit build targeting a 64-bit processor; are you referring to the /IMAGE_FILE_LARGE_ADDRESS_AWARE switch for the Microsoft compiler (this does work to allow Carl's game to be loaded in a 32-bit build, I have verified, but I have read that this can lead to blue screen crashes on 32-bit systems), or to something else?

As to lack of Windows support, do you mean for ipv6 or 64-bit?
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

prissi

There seems many misconceptions. On Intel, a 32 bit is not slower than a 64 bit. Actually the 64 extension are used by extra bytes indicating wider operants (to put is very general). As such the 32 bit code is short( and more cache friendly). But that is less than 5% and hence not really noteable. On the other hand, 64 bit had twice the nubmer of registers, which means less slow stack operations are needed.

More important is that the size of most structures in main are increased, since the may now even align with 8 byte boundaries (compiler and optimiser setting dependent of course). But even packed structuers will be about 20% larger. That means that a map in 64 bit mode requires about 20% more memory (or 20% less fits the cache). This has a strong impact, which depends on the size of frequent accessed areas.

So the speedup/reduction is very dependent on compiler (the 64 bit compiler probably uses a different optimiser) and size of the map. I think you best bet is stopwatch profiling using the until command and the time command in batch files.

The assembler drawing code could be easily converted into 64 bit. But with multithreading it may indeed by not of most importance.

Btw:
Quote/**
* Sometimes we need to pass pointers as well as integers through a
* standardized interface (i.e. via a function pointer). This union is used as
* a helper type to avoid cast operations.  This isn't very clean, but if used
* with care it seems better than using "long" and casting to a pointer type.
* In all cases it ensures that no bits are lost.
* @author Hj. Malthaner
*/
union value_t
{
    value_t()                : p(0)   {}
    value_t(long itg)        : i(itg) {}
    value_t(const void* ptr) : p(ptr) {}

    const void* p;
    long i;
};
is something which is not guranteed to work. long may be 4 byte and *ptr is 8. (Acutally, if ptr would be a ptr to a function, it could be 8 or 12 bytes even on 32 bit systems. Not to mention that there are 6 byte pointers on 32 bit systems.) I think the only guarantee you have with C++ is that sizeof(coid *)<=sizeof(size_t).

[offtopic]I worked once on a system where size of (float)==(char) and sizeof(*)==1==(char) (but in reality those were 20 bits). But since Simutrans won't work on this 48 kB main memory DSP ...[/offtopic]

jamespetts

Thank you for clarifying that. Can you assist with whether using the large address aware MSVC 32-bit builds can cause blue screen crashes on 32-bit systems?
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

TurfIt

Quote from: Dwachs on May 11, 2014, 09:29:36 AM
The ground_texts in grund.cc, in experimental this is done with uint32 .I am not sure about the cleverness part ...
Do you mean you think of this as just a simple clean straightforward fix? Or that you were being too clever and it's still busted?


Quote from: jamespetts on May 11, 2014, 12:38:00 PM
Ahh, I see - what is the advantage of uint64 here?
The change to uint64 allowed the removal of the 16M tile limit. i.e. Humungous maps are now possible to create, but good luck actually running them.


Quote from: Ters on May 11, 2014, 03:38:46 PM
As you have just found an example of, and there might be more, 64-bit builds on Windows in limited to 32-bit adresse space anyway. The problem is that the OS doesn't know that, so all kinds of things might happen. And it seems that there has been put some effort into hiding these limitations, so they won't be easy to remove.
How are 64-bit builds on Windows limited to 32-bit address space? Or do you mean 64-bit builds of Simutrans (and not other programs) need to be limited to such to avoid nasty effects such as could result from value_t (and all the other hidden others).

I've been trying to convince James that Simutrans is simply not designed for 64-bit compilation, and doing so could quite likely be the source of all the desyncs on the Bridgewater server. Finding a 32-bit Linux server to test with seems an impossible task however. Locally I've run a 32-bit server with no desyncs, but then during that specific time my local 64-bit server wasn't giving desyncs either, which it had been before.

Everytime I turn around, I seem to find another area of Simutrans code that could cause problems when compiled 64-bit. Using MinGW64, Simutrans won't even compile due to the casting of pointers to long. Why MSVC is allowing this unsafe practice I don't know...

Until someone goes through the entire Simutrans code base with a fine comb, I think building it as 64-bit is asking for trouble.


Quote from: Ters on May 11, 2014, 07:02:53 PM
First of all, the hand optimized assembly will not be used. Prissi claims this is critical to good performance, but unless this is for running Simutrans on older CPUs (which aren't 64-bit anyway), I'm not really convinced.
Of course you're the one who finds no benefit to the multithreaded drawing either - you must be running zoomed in on a postage stamp sized window!

The asm block in display_img_nc() is quite useful. Over 75% of the CPU time is typically spent in this routine. I don't see why the asm here is disabled for 64-bit builds, I see nothing wrong. The asm in display_fb_internal() on the otherhand is not good - running afoul alignment... And should you trigger the double whammy of USE_C and ALIGN_COPY like 64-bit builds are, then I swear you can watch the pixels be drawn one by one (GCC will *not* inline memcpy here, hence the function call overhead is huge. Replace memcpy with a simple while loop which the optimizer vectorizes to some nice SSE instructions, and you get an order of magnitude speedup).


Quote from: Ters on May 11, 2014, 07:02:53 PM
The second issue, which is more convincing, is that the size of data structures change. Pointers will be twice as big, as will potentially longs.
Both Linux and OSX are using the LP64 model. Pointers and longs both 64 bits. Windows OTOH is using LLP64 - pointers 64 bit, but longs 32 bit. Hence the potential issues...

Ters

Quote from: jamespetts on May 11, 2014, 07:58:03 PM
How does one compile a 32-bit build targeting a 64-bit processor.

That depends on the compiler. For GCC on my 64-bit Linux box, I used -march=nocona. This causes GCC to make use of MMX, SSE (up to 3) and x64 instructions. In my case, I was also building a 64-bit program, but the instruction set is the same. With Visual Studio, I suspect there is a drop-down box somewhere. I haven't tried out this with Microsoft's compiler, but it is surely documented somewhere.

Quote from: prissi on May 11, 2014, 10:08:06 PM
There seems many misconceptions. On Intel, a 32 bit is not slower than a 64 bit. Actually the 64 extension are used by extra bytes indicating wider operants (to put is very general). As such the 32 bit code is short( and more cache friendly). But that is less than 5% and hence not really noteable. On the other hand, 64 bit had twice the nubmer of registers, which means less slow stack operations are needed.

For some reason, I had gotten the impression instructions defaulted to 64-bit in 64-bit mode, but I see now that this is only true for a small subset.

Quote from: jamespetts on May 11, 2014, 10:35:43 PM
Can you assist with whether using the large address aware MSVC 32-bit builds can cause blue screen crashes on 32-bit systems?

Large address aware does not change the size of pointers. It simply changes the boundary between the user and kernel address space from 2 GB to 3 GB (apparently even 4 GB on 64-bit Windows). I doubt Simutrans cares about that, since that may/will be the case on Linux. Dependencies, like SDL, must however also be large address aware.

Quote from: TurfIt on May 11, 2014, 11:56:48 PM
How are 64-bit builds on Windows limited to 32-bit address space? Or do you mean 64-bit builds of Simutrans (and not other programs) need to be limited to such to avoid nasty effects such as could result from value_t (and all the other hidden others).

You guessed correctly.

Quote from: TurfIt on May 11, 2014, 11:56:48 PM
Of course you're the one who finds no benefit to the multithreaded drawing either - you must be running zoomed in on a postage stamp sized window!

Nope, I run in full HD resoultion, and I never zoom in (intentionally) or, perhaps more importantly, zoom out. My 64-bit build of Simutrans was run in somewhat smaller resolution (1280x1024 I think), but also on what is now a decade old hardware. (I had some speed issues at first, but they disappeared suddenly, so I suspect it was a driver problem.)

Quote from: TurfIt on May 11, 2014, 11:56:48 PM
The asm block in display_img_nc() is quite useful. Over 75% of the CPU time is typically spent in this routine. I don't see why the asm here is disabled for 64-bit builds, I see nothing wrong. The asm in display_fb_internal() on the otherhand is not good - running afoul alignment... And should you trigger the double whammy of USE_C and ALIGN_COPY like 64-bit builds are, then I swear you can watch the pixels be drawn one by one (GCC will *not* inline memcpy here, hence the function call overhead is huge. Replace memcpy with a simple while loop which the optimizer vectorizes to some nice SSE instructions, and you get an order of magnitude speedup).

The asm is disabled in 64-bit because it is 32-bit assembly. When I made my 64-bit Simutrans build, GCC did inline memcpy with some vectorized SSE stuff (as long as the ALIGN_COPY code was used, otherwise I got misaligned access crashes).

prissi

#16
QuoteNope, I run in full HD resoultion, and I never zoom in (intentionally) or, perhaps more importantly, zoom out. My 64-bit build of Simutrans was run in somewhat smaller resolution (1280x1024 I think), but also on what is now a decade old hardware. (I had some speed issues at first, but they disappeared suddenly, so I suspect it was a driver problem.)

Cool, I never had Simutrans run of an DEC alpha, I run it once on a SPARC, but it had only 128 MB main meory, so it was no fun. Otherwise my Athlon 64 (from 2005) was the oldest x64 around, I though.

But back to topic, since James uses MSVC, the inline Assembler will never be used. And MSVC Express 64 building need (as far as I know) the commandline, at least the set them up. Not clicking somewhere.

Ters

I seem to have overestimated the age of my Linux box by a year or so. The CPU is only seven years old, and the GPU eight. (The monitor is older, though, but that doesn't have as much to say.)

Making Simutrans 64-bit compatible isn't an impossible task, but there are several things to look out for that are not immediately obvious. A 64-bit build will have a more modern target architecture than what is normal for Simutrans today. New instruction sets means new possibilites for the compiler, causing different machine code to be generated. This might remove the need for certain hand-crafted optimizations. In a few cases, hand-crafted optimizations come in conflict with the new tricks the compiler can pull, such as the ALIGN_COPY thing. In some cases, one might have to write new optimizations for the new architecture, but that means profiling.

There is also a question how portable Simutrans should be. As it is, it is very tuned for x86 and GCC. Some other architectures and compilers work, but without some or all optimizations. Certain x86 optimizations might actually reduce performance on other architectures. Yet other architectures will be incompatible with Simutrans, due to things like value_t here. Writing clean code will increase portability and readability, but sacrifice whatever performance Simutrans has gained on x86 through the hand-crafted and (hopefully) carefully tuned opitimizations. This loss in performance would probably be most felt by those with the oldest hardware, or dumbest compiler.

jamespetts

There is much to be said for future-proofing Simutrans by working with an eye to having it work well with a range of architectures, such as the ARM platform used in tablet devices and so forth. In five or ten years' time, the x86 architecture may have a much smaller market share than now.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Ters

Quote from: jamespetts on May 12, 2014, 06:17:25 PM
In five or ten years' time, the x86 architecture may have a much smaller market share than now.

That's what they said five, or maybe even ten, years ago as well.

jamespetts

Quote from: Ters on May 12, 2014, 06:37:24 PM
That's what they said five, or maybe even ten, years ago as well.

And they would have been right - as a proportion of all computing devices, the x86 architecture does have a much smaller market share than 5-10 years ago owing to the proliferation of ARM powered smartphones and tablets.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Ters

If we consider all computing devices, x86 may never have dominated the market to begin with. And smartphones just replace the phones before them, which were not that different technically. I'm not sure how the number of tablets compare to other "computing devices" (does that include dishwashers as well?). They have at least stopped the spread of PCs to every room. At least until x86 powered tablets with Windows 8, optional keyboard and the ability to run legacy applications potentially cause a covert victory for the PC architecture. The only thing we know is that we don't know how the market will develop.

jamespetts

Ahh, yes, I certainly agree with the last sentence. There is much to be said for keeping our options open and allowing Simutrans to work both on x86 and non-x86 machines in the future.

Edit: There might be much to be said for playing Simutrans on a dishwasher. It would certainly make household chores more entertaining.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.