News:

Simutrans Tools
Know our tools that can help you to create add-ons, install and customize Simutrans.

SDL2 performance regression

Started by TurfIt, January 14, 2018, 11:56:03 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

TurfIt

Recently comparing the performance of the backends in Extended, I noticed SDL2 performing much worse than previously. Switching to Standard instead, the same problem exists.
Using dirty tile updates, the frame times are increased by 33%! But, changing to direct3d renderer instead, then using dirty tiles decreases frame times by 25%, as expected, and as previous.

The debugging switch to disable dirty tiles is still active "-use_hw". Can anyone duplicate? Especially on Linux / OSX platforms where switching away from opengl is not possible...



captain crunch

On Linux 3.16, amd64, with SDL2 2.0.2, SDL Driver: x11:

  • "sim -objects pak -use_hw": ca. 38% CPU usage.
  • "sim -objects pak": ca. 27% CPU usage.
HTH

(Edit: added display stats).

Ters

Could it be spectre or meltdown related? If there are a lot of system calls and you've gotten the patch for them, a 30% performance hit has been suggested.

DrSuperGood

Meltdown only effects Intel CPUs from what I read, it slightly increases the mode change cost overhead by forcing the translation lookaside buffer to dump all kernel pages when returning to application mode. AMD apparently does not need to do this, hence they leaked to the world about its existence.

Spectre effects all pipelined CPUs but the patches should either cause practically no performance loss (micro code) or cause performance loss with specific sensitive tasks (application patches). Basically they have to make sure that processing sensitive data due to pipeline predictions does not leave various measurable traces inside the processor pipeline. An example of a microcode fix would be to force an exact cycle delay in response to an illegal memory access, so that the measurable delay of the pipeline is invariant with respect to address accessed.

Ters

Meltdown affects ARM as well, but I don't think that is relevant in this case. Wikipedia cites a PC World article saying that Spectre patches also cause performance drops, especially on older processors, but up to 14% on what I understand is the newest CPUs.

prissi

Yes, but SDL memory copying is anyway accessing a lot of memory likely not cached, so we are already doing worst case slow memory access here ... so no slowdown due to those I think. Use_HW has never really been a good idea with SDL, and the documentation even says so.

TurfIt

#6
I rather doubt the meltdown/spectre patches are involved - the computer in question is an older win7 unit that hasn't been updated since these came out.

I've now tried on two other systems. Neither shows the large slowdown of the first, but one has SDL2 performing bad period, and all 3 have GDI terrible.
1) I7-3700k, AMD 7970, Win7. SDL2 ok in directx mode, not opengl unless dirty tiles turned off (-use_hw). SDL1 ok - slightly faster, GDI bad.
2) I7-6700k, Nvidia 980ti, Win10. SDL2 ok (directx 10% faster than gl, but gl still ok). SDL1 same times as SDL2. GDI completely unusable zoomed out - > 100ms frame time!.
3) i5-3210M based laptop, Intel graphics, Win10. SDL2 slow across the board, but same directx and opengl. SDL1 ok (50-100% faster zoomed in, but same when zoomed out). GDI 20% slower than SDL2, but remains usable.

Too many variables. WAG - bad ATI driver update... and Intel drivers are always bad for performance.

Disabling the forced opengl mode 'fixes' the issue on system 1. I'd planned to commit that anyways since Dwach's SDL2 crash fixes seemed to fix the directx crashes too, but was/am waiting for after the release so it can have some wider testing...

GDI as a backend choice should just be removed. It's terrible, and just getting worse. >100ms frame times when zooming all the way out now - lol. SDL1 is doing 17ms displaying the exact same.


Quote from: prissi on January 16, 2018, 03:24:33 AM
Use_HW has never really been a good idea with SDL, and the documentation even says so.
Note: the SDL1 -use_hw switch was hijacked by the SDL2 backend to do something very different for testing reasons, and never was removed. In SDL2 it disables the dirty tile calls to SDL_UpdateTexture() and just calls it once updating the entire screen. On system 1, opengl, any more then ~80 update calls it is faster to have one call doing the whole screen.

Quote from: captain crunch on January 15, 2018, 12:59:01 AM
On Linux 3.16, amd64, with SDL2 2.0.2, SDL Driver: x11:
Thanks for trying, but I'm confused... SDL Driver:x11 is a printout from the SDL1 backend... SDL2 doesn't show such??  EDIT: actually it does.
Also SDL2 2.0.2 is quite old. Perhaps try with 2.0.7 current version if possible?
And, unless you're printing out the raw frame timing, you won't see the actual times. The gui displayed times include waiting time.

One way to test without a custom executable is to simply zoom out (and running a high enough resolution) enough to overload your computer so it can't keep up with the selected fps. That's how I noticed it in the first place - zoomed out on a computer than could previously easily handle 30 fps and ended up at 20 instead...

DrSuperGood

All modern OS run most of GDI (except a few select features) with software emulation as it is incompatible with newer more efficient display models. GDI used to be a lot faster as it would write directly to output buffers and have many of its draw options hardware accelerated.

Starting with Windows Vista most GDI hardware acceleration support was dropped as it conflicted with the new window management system. Instead GDI draw calls are applied to an internal window buffer using software routines. When it comes time to display the results the windows management system pushes this buffer out as a texture to the graphic sub system to be displayed. This was required because modern window management systems are hardware accelerated and responsible for drawing all windows unlike the GDI approach where each application was responsible for drawing its window when requested. The cost of this was a massive GDI performance regression as it was no longer possible to hardware accelerate GDI calls.

GDI was replaced by Direct2D in Windows Vista and newer OSes, which is pretty much fully hardware accelerated.

Ters

I have used the GDI backend for years now, I think mostly to avoid having to set up SDL as a dependency, and Simutrans runs just fine. Never had any need for any of this threaded rendering stuff either. My computer is getting rather old, but I run Windows 10. But then I never zoom in or out (intentionally).

The lack of hardware acceleration in GDI might mean nothing for Simutrans, as Simutrans only uses a single GDI function every frame: StretchDIBits. That one may or may not still map pretty straight through to hardware, at least when no special effects are applied. The same is through for all other backends. Simutrans doesn't use them for drawing stuff, just the final upload of pixels to screen. However, since copying these pixels to screen may involve switching to kernel mode, doing a lot of small dr_textur may be less efficient these days, since switching to kernel mode has become more expensive with the patches for Meltdown.

It is possible that GDI is slow, or slower than the others, when DPI scaling is enabled. I have never tried out that, as I only have a HD monitor.

jamespetts

Odd - with SDL2 in Extended (release build on Windows, 64-bit) I get a slightly higher framerate (15-16) with -use_hw off than with it on (12-13fps). This is zoomed all the way out in the current Bridgewater-Brunel server game and a 4k monitor (without DPI scaling).
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

TurfIt

A little awkward that sentence.. but if by -use_hw off you mean not present as an argument, then lower frame rate when using it is the expected behaviour when things are working properly. It disables the dirty tile updates in SDL2, updating the entire screen every frame instead. You might also want to try directx instead and see if that's even faster for you.... (and assuming you're actually using opengl currently - unfortunately the helpful diagnostic messages got hidden behind some nasty debug macros so don't show anymore, but as long as you have 'proper' drivers installed, it should be using as opposed to the final software renderer fall back).

Ters

Whatever the cause, it sounds like there is an increased overhead in OpenGL calls. There was a rule of thumb back in the day to do as few calls to APIs like OpenGL and Direct3D with as much data as possible (ideally data that was already in VRAM). That was the terminal bottleneck in my attempt to use OpenGL to hardware accelerate simgraph16.cc.

prissi

First, the backend drawing speed should not care of the zoom mode. As said, it just copies the already draw bitmap to the buffer. This one may be memory mapped or not, and is usually hardware accelerated (because this call is used for all icons etc. internally).

The main problem with OpenGL SDL2 is, that it crahses all the four laptop and two desktops with ancient GeForce, and inbuilt intel and AMD graphics. None of them can run the steam version. Unless this is fixed it means that 50% of the user cannot use the openGL SDL2 rendering.

jamespetts

Quote from: prissi on January 20, 2018, 03:19:47 PM
The main problem with OpenGL SDL2 is, that it crahses all the four laptop and two desktops with ancient GeForce, and inbuilt intel and AMD graphics. None of them can run the steam version. Unless this is fixed it means that 50% of the user cannot use the openGL SDL2 rendering.

I am able to run a native Linux SDL2 build of Extended on an Intel i5 (Skylake) NUC, which uses Intel integrated graphics, albeit I run Linux on that, so the drivers might be different.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

prissi

Sorry, I should have said Windows, because for Linux there is not GDI ... But the SDL2 builds with did trz OpenGL first crashed for me every time on all my WIndowscomputers.

jamespetts

Hmm - is this a known issue with SDL2?
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

prissi

I never get much answers on my complains, since it mostly affect builtin graphic adapters.