News:

SimuTranslator
Make Simutrans speak your language.

Performance - GDI vs SDL ( Windows XP vs OS X 10.8.2)

Started by meme, February 04, 2013, 06:44:53 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

meme

Hi,
I would like to report a huge performance difference between OS X and Win version:


Hardware configuration of Mac(book Pro):
Core 2 Duo 2,53GHz
8 GB RAM
nVidia Geforce 9600M GT
Screen: 22" external 1920*1080
OS X 10.8.2
----------------------
Configuration of virtual machine:
1 CPU core (that doesn''t make difference, simutrans is single-thread, isn't it?)
3GB RAM
256 MB of shared GPU VRAM
Win XP
---------------------


Performace difference is enormous.
OS X simutrans is unusable, 4FPS while running @ FullHD and GUI has delay about 5seconds ..
Windows simutrans is running fine, although  it's virtual PC.. while running @ FullHD, I receive 17 FPS!


I tried OS X native version which, unfortunately, worked only with OS X Lion - With that, FPS were similar to Windows GDI version (+-).


I would like to ask, what's causing this speed difference? Is SDL that slow?


Thank you







DirrrtyDirk

Quote from: meme on February 04, 2013, 06:44:53 PM
I would like to ask, what's causing this speed difference? Is SDL that slow?

Speaking as a pure Windows user here, I've never had any performance problems with the SDL version (and that's what I usually play with, since in the old days, the GDI version used to have some rather annoying habits, and I never changed back since...)

Have you tried the Windows SDL version on your virtual PC as well? If SDL itself was the problem, it should perform worse there as well I guess... or is it just the SDL-implementation for Mac that is so slow...?
  
***** PAK128 Dev Team - semi-retired*****

transporter

The only time I've had performance issues is the map refresh for like seasons and stuff. Even then, only with maps larger than 1028x1028. That's with OS X 10.6, 2.8 GHz Duo, a 9400M, and 4 GB RAM

prissi

Actually simutrans is using multiple cores for displaying (and some stuff like map rotation). Did you try some MAC executable from the nightly? THose are also SDL but I had never heard complains yet.

meme

I have updated SDL from 1.2.14 to 1.2.15, I've installed newest nightly.. But it's still same :(


Edit: On 2008 iMac (2,66GHz CPU, 3GB RAM, HD2600 Pro) is simutrans running much better.. I don't undestand why..


Ters

Maybe this is another case of the hypothetical throttling effect. The virtualization, like Wine in the Linux case, adds enough overhead so that the computer doesn't reduce it's speed to save power, while the slower computer gets a higher relative CPU load or doesn't do this kind of throttling at all.

meme

That would be weird, because other games are running fine ... (i.e C&C 3 Tiberium Wars)



And one note: WINE = WINE IS NOT EMULATOR ;) --> It's only complex of Windows libraries and some translating API, it isn't virtual OS :) (Can simutrans run in wine? Just for test :) )



Edit: It does, and better than native version  :-X


Markohs

I was thinking about the CPU throttling effect too. Can you somehow disable the CPU throttle somehow to test if this makes a difference? On PC's you can do this in BIOS setup.

meme

Hm... I think I might by deleting *.kext(s) which is controlling IntelSpeedStep - But I'd like not to do that ...

A why would CPU throttle, when other things are running fine? (and if I run some single-core intensive task, like my USB TV Tuner, it runs still same, only with a bit lower FPS ( perfectly fine, as it is heavy task for GPU)


Ters

I know Wine is not an emulator, but the mapping between the Windows API and the host API will add some overhead, as will the hypervisor when doing virtualization. The uncertainty regarding whether this overhead is significant enough is the reason I used the word hypothetical.

According to the throttling theory, other games might run fine because they either have a higher CPU load, which keeps it from being throttled down, or lower I/O demands, which means they run fine with the lower bus speed in power saving mode. (GPU load might also be a factor.)

meme


if it's too small: http://postimage.org/image/dzx1uqr2j/

Simutrans running in wine.
Simutrans running in native environment http://s11.postimage.org/tm7jevmgz/Sn_mek_obrazovky_2013_02_05_v_19_51_37.png


Markohs

We are just asking you to disable Speedstep to diagnose if the problem is there. There might be a bug on simutrans that already showed in Ubuntu machines, that makes it not perform properly on CPU's that scale speeds. But this is not confirmed.

Ters

I seem to remember that CPU load was lower in at least one of the two other cases, but I couldn't find the numbers at a glance. Unless OS X shows CPU usage as a percentage of the throttled CPU speed, 20 % could be high enough to prevent throttling.

How big is the map? The 20 % is about what I get on my single-threaded Simutrans on Windows with about the same CPU speed (but more cores) and display size. So what does Simutrans do the other 60 %? Unoptimized build?

meme

Map is 448^2 (448*448) tiles. 


Another screen: http://postimage.org/image/9qfq5n7fz/ (It's native version)



Edit: Looks like I'll have to talk with Apple ( They're either hiding EIST or it doesn't work at all)
hw.cpufrequency_min: 2530000000
hw.cpufrequency_max: 2530000000


Ters

That doesn't sound right. My 2.3 GHz computer does bigger maps with much less CPU than that, though it's a newer and more powerful CPU/computer which might have something to say. Since I build Simutrans myself, it can also utilize newer features than the general releases.

meme

And could you "lent" me your release? I'd try it and we (you) will see the difference, if any will be there..


Ters

My build won't work on a Core 2 duo. Besides, it's not the Windows version that you're having problems with.

meme



Markohs

Quote from: Ters on February 05, 2013, 07:22:49 PM
I seem to remember that CPU load was lower in at least one of the two other cases, but I couldn't find the numbers at a glance. Unless OS X shows CPU usage as a percentage of the throttled CPU speed, 20 % could be high enough to prevent throttling.

How big is the map? The 20 % is about what I get on my single-threaded Simutrans on Windows with about the same CPU speed (but more cores) and display size. So what does Simutrans do the other 60 %? Unoptimized build?

I'd say on all systems cpu % is the percentage of time it has been busy in a certain period of time, regardless of its frequency or potential computing power


prissi

I highly suspect SDL, which might have an issue with 16 bit color depth. That is seldomly used nowadays, and support for it became more and more crappy. Try SDL version 1.2.12 or 13.

meme

You're right, with 1.2.12 I get 13 FPS instead of 4! :) It's still less than with CDI, but it is playable ;)

Thank you

Link for other Mac users with this problem: http://www.libsdl.org/release/SDL-1.2.12.dmg


meme

I've probably found what's causing bad performance: Feb  7 16:34:38 MBP.local simutrans[53717] <Error>: The function `CGSFlushWindow' is obsolete and will be removed in an upcoming update. Unfortunately, this application, or a library it uses, is using this obsolete function, and is thereby contributing to an overall degradation of system performance. Please use `CGSFlushWindowContentRegion' instead.



- It's caused by using [size=78%]CGSFlushWindowContentRegion instead [/size][size=78%]CGSFlushWindow[/size]


prissi

THat should be rather done by the SDL people, as I have exactly zero control about this function :(

meme

But this appears with 1.1.12 version, which is last normal-working version of SDL with simutrans...  .13 is behaving as bad as the last one.


Ters

I can't really say I see any obvious culprit in the SDL source code for Mac between 1.2.12 and 1.2.13, but then the Quartz code is rather cryptic to me. It could also be a change that isn't specific to Mac.

prissi

I really suspect 16 bit support to be changed, probably from hardware to software support, i.e. emulation. That seems most likely. But I am MAC illíterate, thus this is only a guess.

One may try to compile 15 bit support, and see what happens. This almost certainly will require software emulation. However, slowing down was not the intend.

Sorrento

Quote from: Ters on February 07, 2013, 07:07:33 PM
I can't really say I see any obvious culprit in the SDL source code for Mac between 1.2.12 and 1.2.13, but then the Quartz code is rather cryptic to me. It could also be a change that isn't specific to Mac.


I m using 1.2.15 with 15" rMBP, and I have low FPS, roughly about 3~4 FPS. It seems to support Retina Display.

I have tried 1.2.12. It was smooth, except that it doesn't support Retina Display. I got four third of the screen with white and empty. No mouse neither.

prissi

A larger display (i.e. retina) needs four times the computing power (and in reality even about 10 times). Thus there is not much than can be done bu zoom in to have less objects on the screen. Does teh frame rate increase significantly when zooming in?

Sorrento

Quote from: prissi on March 12, 2013, 04:00:51 PM
A larger display (i.e. retina) needs four times the computing power (and in reality even about 10 times). Thus there is not much than can be done bu zoom in to have less objects on the screen. Does teh frame rate increase significantly when zooming in?

Zooming in have played no effect in my case, but it seems the windows size of the game in OSX have effects on it. if I enlarge the windows to near full screen, then frame rate would drop to 4 fps from 10 fps, but if I use the default windows size, the frame rate would drop to 6 fps from 10 fps.

The update from OSX 10.8.2 to OSX 10.8.3 has no help on it.

I now have to run the game in virtual pc to get the solid 10 fps with 1680 x 1050 in the virtual machine.

Hopefully SDL or GDI could have better support for retina in the future......

Ters

It might be that it's not computing power, but all the individual pixels having to move from system memory to the graphics card. When running in a virtual PC, it only needs to do the 1680x1050 pixels, which then might be scaled up when compositing on the graphics card. It doesn't explain why the default window size is slower, though, unless the default window size is somehow scaled up to avoid being tiny.

10 fps is still low on a modern computer (less than five years old, maybe a bit more).

Have all (or the one) Mac developer(s) left?

ArthurDenture

I spent some time investigating the performance of Simutrans on Mac, since I'm in a similar situation. I have a 15" Macbook Retina, OS X 10.8.3, running at 1920x1200. I have compiled Simutrans from source using SDL 1.2.15. When I maximize Simutrans, its framerate drops from 25fps to 5fps. Here are the observations that I've made:


- SDL runs in software mode when running windowed.
- SDL_Flip() takes about 150ms to execute. In software mode, that's equivalent to SDL_UpdateRect(screen, 0, 0, 0, 0), which is not fast. That accounts for almost the entire frame drawing time.
- Incidentally, I spent some time getting multithreaded mode to work on Mac. The main barrier is lack of pthread barrier support. But once I found that the multithreaded portions of simutrans were not the bottleneck, I abandoned this work.
- I get decent performance with hardware acceleration when running fullscreen, with the catch that I have to turn off automatic graphics switching in the system preferences, otherwise the game renders wrong. (It seems like only the bottom left quarter of the game is visible. No idea what's going on there.)
- I tried various flags to SDL_SetVideoMode, with no effect. I also tried setting the color depth there to 32 instead of 16, producing trippy colors but no effect on performance.
- I tried disabling USE_HW in dr_flush and dr_textur, such that SDL_UpdateRect would be called on dirty tiles and SDL_Fill would never be called, matching the behavior of other platforms. This performed way worse, taking >1s to render each frame.
- http://sdl.beuc.net/sdl.wiki/FAQ_MacOS_X_Windowed_Mode_is_slow seems *extremely* relevant :-)


A few resulting questions:
- What's with USE_HW? It's only defined on mac, and it appears that the other platforms just use software rendering (along with only updating dirty tiles instead of calling SDL_Fill). Does SDL software rendering simply perform better on the other platforms? (I certainly get fine performance on linux, though on an admittedly slightly lower-resolution monitor.)
- What's the status of the opengl backend? I was able to get it to compile with a handful of Makefile tweaks (locating the glew library with pkg-config; adding "-framework OpenGL"), and it seemed to work ok. (The news ticket had an awful flicker, but that went away if I forced pbo_able = false. Probably a straightforward bug in that branch of code.) It runs in hardware-accelerated mode, even windowed, and I get 25fps and 25ms idle time with it. Is there a backstory as to why it's not the default? Using OpenGL even for 2D rendering seems to be the right way to get good performance.

Ters

The OpenGL backend doesn't do 2D rendering. At least it didn't the last time I saw it. Simutrans just uses OpenGL to shuffle the data after rendering it the normal way.

The only advantage I can imagine that the SDL+OpenGL backend has over normal SDL is that the graphics completely bypass the window manager. Apart from that, it is actually a detour, and a bit hackish one at that.

prissi

@ArthurDenture You might want to try the allegro backend. If this works better on the MAC, maybe we should use this as default. But I think the main reason for not using OpenGL is the cross compiling. Not sure though.

ArthurDenture

@prssi Just went and tried that but couldn't get Allegro 4 to compile. https://www.allegro.cc/forums/thread/608825 suggests that Allegro 4 uses deprecated APIs that have been removed on Lion. (Which seems accurate: the compilation error was about an unreferenced variable "useLocalHdwrMem", which apparently is defined by older Quickdraw libraries.)

I'd suspect that cross-compiling the OpenGL backend is as easy as the SDL backend (since it's really the same except for one extra library). I'll be happy to try it if there are docs.

@Ters I guess that's what I meant by 2D rendering. It uses OpenGL just for copying the bitmap created by Simutrans onto the screen. The advantage is indeed purely that the hardware acceleration support is much better. Not sure what makes it a hack. (I mean, simsys_opengl.cc is very hackish in the sense of being a copy-paste from simsys_s.cc, in a manner that means fixes to the latter didn't always get applied to the former. But that can be cleaned up.)

Ters

It's a hack in the sense that it doesn't check capabilities, except perhaps in a single case or two, nor does it handle potential errors in any significant way. But the most hackish part is simply doing all this OpenGL stuff just to blit a bitmap to the screen. It's almost like buying a car each time one needs to go to the grocery store, only to discard it when getting home, when the grocery shop is just as far away as the car shop.

I, and later Markohs, have been looking into making better use of this "car" when we first have it, but the internals of Simutrans and best use of OpenGL do in almost no way agree.

Writing a native backend for Mac is probably the best solution. It might be that native backends are the best solution on all platforms in order to make use of touch devices and gestures. Platform independent libraries seems to neglect input the most. Unfortunately, I haven't seen any updates from the native Mac project in a while.

Bjarni

Quote from: prissi on March 12, 2013, 04:00:51 PM
Does teh frame rate increase significantly when zooming in?
I have the same problem on 10.6 and I get 4 fps when using the whole screen as well, nomatter the zoom level. Trying SDL 1.2.12 didn't really affect anything either.
Didn't really try fullscreen as every second frame appears to be black, resulting in a flashing screen. I ended up feeling dizzy from just looking enough to find the quit button. I dismissed any further investigations of this issue die to my own health. For all I know it could have been other software running on my computer at the time, which tried to take screen control and as such not a simutrans problem.

Quote from: Ters on March 30, 2013, 10:01:08 PM
The OpenGL backend doesn't do 2D rendering. At least it didn't the last time I saw it.
Quote from: Ters on March 31, 2013, 08:04:29 AM
the internals of Simutrans and best use of OpenGL do in almost no way agree.
That depends on how you do it. One option is to make a few huge polygons (max size) and then add the screen as a skin to those. One guy once told me he did that to... *something* (oops, I forgot what software he ported :-[) and it performed with lightning speed. I can try to dig up what he wrote if people are interested, though he didn't put many details into it, just the concept.

OSX itself relies heavily on OpenGL. Any mac compiler has access to it even without adding libraries. Also the mac implementation of OpenGL is really good (unlike SDL which is horrible on mac).

Quote from: Ters on March 31, 2013, 08:04:29 AMWriting a native backend for Mac is probably the best solution. It might be that native backends are the best solution on all platforms in order to make use of touch devices and gestures. Platform independent libraries seems to neglect input the most. Unfortunately, I haven't seen any updates from the native Mac project in a while.
Something like that is done by Timothy here: http://forum.simutrans.com/index.php?topic=8783.0
It's much faster though I'm not sure performance will be as fast as the OpenGL solution as it looks like it lacks hardware acceleration.

Markohs

#36
we had one version of simutrans using OpenGL and just writing to a texture each frame, that might have better performance that the current SDL version in Mac, I can assemble the code for you to try if you want.


It uses a mixture of SDL and OpenGL, just replace simsys_opengl.cc for the simsys*.cc that you are using now when compiling, should be fine.


EDIT: Just read the posts above and you already talked about  this, sorry. ;)

Markohs

Quote from: Ters on March 31, 2013, 08:04:29 AM
I, and later Markohs, have been looking into making better use of this "car" when we first have it, but the internals of Simutrans and best use of OpenGL do in almost no way agree.

Yes indeed, but it will be possible to implement it in the future, if we are able to keep in vertex buffers the geometry of the objects synced with the world, plus some management of what's inside the viewport and not, so save memory and bandwidth.

That's why I'm implementing some features in the normal simutrans, to know it better so I can hopefully manage to design how to do this. I lack enough knowledge of the inner of simutrans atm.

ArthurDenture


There are naturally tradeoffs in using a native renderer: on the one hand, you usually can get good performance and better native look-and-feel. (Though, as Simutrans handles all its dialogs and text input itself, native look-and-feel is not applicable here.) On the other hand, it's yet another backend to maintain. I suppose it could be worthwhile to do a native Quartz version... I don't have any experience writing native Mac software, so I'd not be able to help much with that except by testing it out.

As for OpenGL: one way to sidestep the question will be to port to SDL 2. Their documentation claims that it will use OpenGL behind the scenes for hardware-accelerated 2D rendering whenever appropriate. That way you get the best of both worlds: fast rendering without having to clutter up the code with OpenGL calls that exist only to get a fast rendering context.


I actually spent some time trying this out, with some success :-). You can see the work-in-progress commit over at https://github.com/artdent/simutrans/tree/sdl2. It's still broken in various ways, but perhaps someone else wants to take a look and hack on it as well. It does get excellent performance, so it's probably well worthwhile to pursue this route and attempt to fix the rest of the bugs in my port.


Naturally, it wouldn't be possible to integrate this change until SDL 2 is actually out and has published binaries for the important platforms. And it'd be IMHO better to upgrade the SDL backend in-place rather than attempt to ship it as Yet Another Backend Option.

Ters

Quote from: Bjarni on March 31, 2013, 02:09:17 PM
That depends on how you do it. One option is to make a few huge polygons (max size) and then add the screen as a skin to those. One guy once told me he did that to... *something* (oops, I forgot what software he ported :-[ ) and it performed with lightning speed. I can try to dig up what he wrote if people are interested, though he didn't put many details into it, just the concept.

That's what I did, but defining geometry, setting up projection and transformation matrices, allocating texture buffer, configuring that texture buffer right (no mipmaps, correct filtering, etc), and so on, seems like serious overkill just to tell the system to copy x times y pixels from one place to another.

TurfIt

#40
Quote from: ArthurDenture on March 31, 2013, 06:47:35 PM
You can see the work-in-progress commit over at https://github.com/artdent/simutrans/tree/sdl2. It's still broken in various ways, but perhaps someone else wants to take a look and hack on it as well. It does get excellent performance, so it's probably well worthwhile to pursue this route and attempt to fix the rest of the bugs in my port.

I just tried this on Win7. Unfortunately performance vs SDL1 is rather lacking.











SDL213.6 ms/frame
OpenGL, PBO9.3 ms/frame
SDL1, ST, -use_hw9.0 ms/frame
OpenGL8.3 ms/frame
GDI, ST7.6 ms/frame
SDL1, ST, UpdateRect7.4 ms/frame
SDL1, ST, UpdateRects6.5 ms/frame
GDI, MT5.4 ms/frame
SDL1, MT5.3 ms/frame
ST=single threadedMT=multi threaded
Ouch.  And it is also most definately busted in many various other ways too... as is the OpenGL PBO...
Note: for MT, that's just how long the main thread is spending in sync_step, returning faster just means the main routine gets to sleep sooner, no performance change unless the main thread ends up waking again before the screen copy is finished.

Edit: Added GDI results for completeness.

Markohs

 Thanks for those numbers TurfIt. :)

When I finish my world limits patch I'll get my hands on the OpenGL backend again, I have new ideas to apply there, let's hope they improve things. It's quite a lot of work, but I think they will be successful. :)