The International Simutrans Forum

 

Author Topic: Should we make a more-optimized build for x86?  (Read 3510 times)

0 Members and 1 Guest are viewing this topic.

Offline neroden

  • Devotees (Inactive)
  • *
  • Posts: 831
  • Nathanael Nerode
Should we make a more-optimized build for x86?
« on: April 15, 2010, 03:57:36 PM »
Just for fun I compiled simutrans with gcc 4.3's -march=native and -fno-schedule-insns options.  (On my Pentium 4.)

It's *FAST*.  Noticeably faster, to the 'naked eye'; higher frame rate, much less pause when months and years change, etc.

It might just be -fno-schedule-insns (which apparently speeds up almost everything).  More likely, it's the -march=native, which allows the compiler to use MMX, SSE, SSE2, and all kinds of processor-specific tuning.

If we could figure out which of these things is actually giving the main benefit, and if it's a common chip feature, it might be a good idea to make an official build with that option turned on.  The default for gcc is to build generic "i386" code, which actually gives code which will run on an 80386, and as a result can be painfullly slow on Pentiums, or even i486s.  But who has pre-Pentium machines these days?

I'm not quite sure what the best way of bisecting among the various GCC architecture options is though; it would probably require some type of profiling for the different builds.

EDIT: Well, a simple rebuild test tells me it's probably just -fno-schedule-insns making things faster.  So I think that's straightforward enough....
« Last Edit: April 15, 2010, 04:31:08 PM by neroden »

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Devotee
  • *
  • Posts: 18550
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Should we make a more-optimized build for x86?
« Reply #1 on: April 15, 2010, 05:00:45 PM »
Hmm - what does -fno-schedule-insns do?

Offline prissi

  • Developer
  • Administrator
  • *
  • Posts: 9454
  • Languages: De,EN,JP
Re: Should we make a more-optimized build for x86?
« Reply #2 on: April 15, 2010, 08:35:39 PM »
That specific instruction actually does nothing on gcc 4.x according to documentation. But I think it means -fno-sched-stalled-insns

What this does is not so easy to explain. First GCC is not directly generating machine code. Instead the parser generates a list of more or less elemental statements (if() or x = y+z; and so on) which are called insns. Those can be rearranged or cloned by the optimizer, for instance if insns 1 is expecting to wait for external data then insns 2 is done in between.

The above statement aparently prevent a lot of register loading and discarding, using the CPU more like a stack processor than a register CPU. This might be better, when there is a fats 3rd level cache and much more operants than ever could fit into the still few registers of x86 CPUs. However, I am her completely guessing.

march=native does not use much (or as much as I know extremely little MMX and almost no SSE). The default is for i486 by the way, i386 support is not with MinGW (since also Win32s cannot run on 386 properly). Compiling for Pentium make sense, since simutrans does not run on a 486 well, even a DX4 is too slow (I tried this once). You can run it however on a Pentium 150 upwards with 48MB RAM (tested regularly).

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Devotee
  • *
  • Posts: 18550
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Should we make a more-optimized build for x86?
« Reply #3 on: April 15, 2010, 08:44:21 PM »
There could always be different versions optimised for different platforms.

Offline neroden

  • Devotees (Inactive)
  • *
  • Posts: 831
  • Nathanael Nerode
Re: Should we make a more-optimized build for x86?
« Reply #4 on: April 16, 2010, 06:13:19 AM »
Hmm - what does -fno-schedule-insns do?

GCC makes two "instruction scheduling" passes (designed to avoid pipeline stalls IIRC), one before register allocation and one after.  This disables the one before register allocation, which has never worked very well on any platform; the interesting thing is that the code rearrangement prior to register allocation often seems to damage the register allocation process, making code worse.

I worked on GCC a while back, this was a recurring problem.  I actually suggested removing the first pass permanently, since this problem seems to happen *across architectures*, but nobody ever was quite willing to do so.

Offline prissi

  • Developer
  • Administrator
  • *
  • Posts: 9454
  • Languages: De,EN,JP
Re: Should we make a more-optimized build for x86?
« Reply #5 on: April 16, 2010, 09:59:16 AM »
But according to documentation this command dos only exist up to gcc 2.X ...

I would happily include this, as it would cause no harm, but I wonder if this is really the recommended command syntax. THe GCC documentation for the commands for the optimizer does not list this.

Offline neroden

  • Devotees (Inactive)
  • *
  • Posts: 831
  • Nathanael Nerode
Re: Should we make a more-optimized build for x86?
« Reply #6 on: April 17, 2010, 01:02:29 AM »
But according to documentation this command dos only exist up to gcc 2.X ...
Look up -fschedule-insns.

"-fno-schedule-insns" is the negative form of it.

It's listed in the "positive form" in the manual because -fschedule-insns is disabled by default, but it's enabled if you use -O1, -O2, or -O3.  So -fno-schedule-insns only does something if you also have -O1, -O2, or -O3 active.  With -O0 it's the default.

Offline prissi

  • Developer
  • Administrator
  • *
  • Posts: 9454
  • Languages: De,EN,JP
Re: Should we make a more-optimized build for x86?
« Reply #7 on: April 17, 2010, 09:54:03 PM »
Thank you, I apparently missed this. (Still strange not to list all available commands though.) Is included now.