News:

Simutrans Wiki Manual
The official on-line manual for Simutrans. Read and contribute.

Bad performance regardless of weather and daylight

Started by Dwachs, November 15, 2011, 09:31:37 AM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Dwachs

There are complaints about bad performance of simutrans floating around.

Feel free to post in this thread. Please clarify, under which conditions you experience bad performance:

-- pakset
-- screen resolution / (did you zoom out?)
-- your savegame

The following things will influence performance the bad way:
-- large screen resolution
-- zoom out
-- maps with huge population
-- maps with huge number of passengers
-- 64bit systems (as then the memory area to be processed all the time is larger)

What can you do to get better performance:
-- do not zoom out, make viewport smaller
-- turn off pedestrian
-- do not use transparent overlays (station overlay, transparent buildings)
-- post in this thread the exact circumstances of bad performance.
Parsley, sage, rosemary, and maggikraut.

prissi

And also frame rate is an important setting. Please report this too.

ojii

For completeness sake here my specs again:

Intel i7 820M CPU, 4(+4) core@2.3GHz
16 GB DDR3 RAM@1600MHz
Nvidia GeForce GTX 560M 1.5GB GDDR5
Ubuntu (11.10, 64bit)
3.0.0-12-generic Kernel

Using latest stable simutrans 110.0.1

With pak128.japan 110.0.1, tried with FPS=10 and FPS=50. If I play using my full screen (1920x1080 - a few pixels for the bar at the top for gnome) the game is incredibly unresponsive. This is on a new map (512x512) with no lines/vehicles. Random wildlife is turned off. Dragging the map (rightclick+drag) only responds to the input sometimes. The complete UI (clicking the map buttton, trying to click somewhere in the map, trying to get info about a building) is not responsive all the time and hardly useable. When playing at about 1/4th of my screen the game is mostly playable (but still unresponsive in some cases, especially moving around on the map), still with no lines. The main reason I don't want to play with such a small window is that on a full HD screen it's a bit too small, and the UI get's really cluttered.

I'm happy to perform more tests, just tell me what pakset/map-settings/savegames I should load and fool around with. I might also try to re-compile it using allegro to see if that helps.

EDIT: My maps usually have fairly big populations, since I prefer playing pax only. So roughly 300k people on a map

TheUniqueTiger

@ Prissi,

I can also say performance is slow on larger display. My computer specs are Core2Duo 2.93GHz, 3GB DDR2 RAM, ATI HD5450(1GB) with Win7. I usually play with a pak128 2048*2048 map with FPS 25. With a window of 1920*1080, having about 20 relatively small cities and 32 industry chains and 56 vehicles (mostly trains & buses), CPU usage is around 40% and GPU usage is around 12-15%. Changing the window size to 1280*768 changes the CPU usage to around 15-20% and GPU usage around 5-7%. Zooming in increases CPU usage. This means the CPU and GPU usage are both directly proportional to the number of tiles shown on screen. Using pak64 with same settings and window size, scrolling has huge lag and CPU usage is consistently higher than pak128. There were no other programs running simultaneously. This clearly determines that CPU usage is mainly for displaying/rendering.

In my opinion, more of the rendering should be offloaded to the GPU and CPU should be relatively free. If I remember correctly Simutrans stores all texture images and tile coords in RAM, but ideally it should first try to store it in GPU memory and if it fails then use RAM. Only the coords of the moving objects should be passed to GPU for rendering, rest of the static things to render should be handled by the GPU only. Most of the limiting aspects are due to CPU spikes especially while scrolling and otherwise consistently high. Its quite disapointing that in spite of a game which runs well on low config computers, simply a large window screws up the whole experience. I can't even imagine the pain that ojii must be experiencing with such a high-end config and still such a slow performance. Besides in today's date it is perfectly acceptable to lay down a minimum requirement of 512MB RAM and 256MB graphic memory, further subject to map size. As the game is still 2D there should be no major issues with onboard graphics or graphic cards.

Ashley

On a 1024*1024 map using the Windows SDL 111.0 stable I see ~23fps on normal zoom, dropping to 12 fully zoomed out. This is with the latest stable pak128.Britain.

Perfectly playable like that IMO.

I notice that the game isn't using 100% of the CPU core that it is running on however, even setting it to use a core exclusively (using core affinity) doesn't improve the framerate. The game uses the entire core but instead of a higher framerate I see idle time sitting at 25ms instead. It's managing about 4.5 simloops set up that way.

This is on a system with a 4-core Phenom II X4 965 CPU at 3.4Ghz.

Interestingly, scrolling doesn't seem to affect the framerate at all, or maybe by a tiny amount. And the number of simloops actually increases during scrolling!

I also occasionally see the idle figure jump to massive values, like 570000000 or so (hard to see the exact number since it changes very quickly). Possibly this is some kind of overflow in the counter, or a signed value being interpreted as unsigned?

Just a few quick observations anyway.

Use Firefox? Interested in IPv6? Try SixOrNot the IPv6 status indicator for Firefox.
Why not try playing Simutrans online? See the Game Servers board for details.

ojii

My games usually use 100% of one CPU. Not sure if trying to go a multi-threaded/multi-processing route would be feasible/beneficial, but maybe something to think about seeing that it's pretty much standard to have at least 2 cores today.

prissi

There are always 5 simloops targeted, no matter what fps is set. During simloops production, loading, and routing is done, while during a frame all stuff is moved on those calculated routes. Since simloops are differently time consuming, simutrans uses the time needed for the last 10 simloops or so. Therefore it may not use the entire CPU.

If you can fast forward (in which display updates are fixed at 10 fps), then the one CPU core is maxed out.

The copying to display is done in another thread, so if you are following a vehicle automatically on a map, you can achieve up to 60% usage in a double core CPU. For that reason scrolling does not affect CPU that much.

Simutrans with complex interacting objects cannot easily do multithreading; for instance one car has to wait for potentially all other cars before moving. And the order of moving those must be the same on ALL clients too for multiplayer.

A 100% CPU load while ideling I had also once due to a bug in SDL. It was never actually waiting the milliseconds I asked to wait but returned immediately. The event loop was running like crazy. In the end, it suddenly paused when the requested pause time got over 20ms. However this was gone the next time I checked.

Did you try to either compile a 32 bit version or use the allegro backend?

prissi

The way the tiles are displayed in order to reduce drawing errors in not easily transferred to a GPU, as those are very bad at clipping non-rectagular areas. This new algorithm, which really reduces clipping errors very much eats up about 20% more CPU than 102.2.2 or so.

There was also an OpenGL display driver for OpenTTD. ( http://www.tt-forums.net/viewtopic.php?f=33&t=38151 ) In the end it did not worked out. But anyone is free to use the code as a start for implementing an OpenGL blitter for simutrans.

jorrit

I just today switched from:
  - simutrans 110.0.1 (64-bit ubuntu) with pak128-1.99.0-alpha--110.0.1.zip
to:
  - nightly build of 111.0.1 (for 64-bit) with pak128-2.0.0--111.0.zip

and performance went down considerably. The new nightly build is a LOT slower compared
to the previous version. I know that nightly builds are not supposed to be stable or anything
(I'm project manager of an Open Source project myself and we also have stable and unstable
versions) but I'm still wondering why the performance is so bad. Also I'm using the nightly
build because I wanted to try the new pak128-2.0.0 and that requires 111.0 and I couldn't
find a 64-bit build for linux (and the 32-bit build didn't work).

At first I thought the nightly build might have been build with debug mode but that
doesn't appear to be the case as far as I can see (may be mistaken on that).

BTW, I'm running a 1024x1024 map and fullscreen (1920x1080).
In the old simutrans from ubuntu I get about 22-25 fps with that setup. And that's with a relatively
complex map with lots of vehicles.
In the nightly build I get 12 fps and that's with a fresh started map with a single bus.

Any ideas?

Greetings,

prissi

For timing both versions used the same (new) clipping algorithm. There is no reason for different display timing. I strongly suspect there has been a change in SDL.

isidoro

I would also point out that in Linux versions there used to be a problem with playing midis that made the game freeze every time a new song started.  Maybe it should be worth mentioning that at the introduction post (run the game with the nomidi option).

ojii

Is there any way to benchmark simutrans? I think that would be a useful thing to implement to actually get hard metrics on this.

prissi

Benchmarking is very easy. Compile it with PROFILE=1 set in the makefile. It will run much slower tough.

It will then produce an gmon.out file in the simutrans folder, which could be translated to human readable by gprof. (If the release version from sourceforge already produces such a file, then this was accidently compiled with profiling on; That would be of course an error, that can be easily cured.)

ojii

Quote from: prissi on November 16, 2011, 11:16:51 AM
Benchmarking is very easy. Compile it with PROFILE=1 set in the makefile. It will run much slower tough.

It will then produce an gmon.out file in the simutrans folder, which could be translated to human readable by gprof. (If the release version from sourceforge already produces such a file, then this was accidently compiled with profiling on; That would be of course an error, that can be easily cured.)

That will just output profiling information I assume? I was more thinking of something like "simutrans --benchmark --pak=myawesomepak --savefile=mysavefile.sve --time=6' that would load the savefile using the pak given, and run it for 6 months and tell you how long that took. Like this you could run the same command with the same pak/savegame on another computer/OS/... and see the difference.

Would it be useful if I provided one of those profiling outputs from my machine?

Ashley

ojii, that sounds like a useful feature to implement, maybe one of those low-hanging fruit you were speaking of? :)
Use Firefox? Interested in IPv6? Try SixOrNot the IPv6 status indicator for Firefox.
Why not try playing Simutrans online? See the Game Servers board for details.

ojii

Quote from: Timothy on November 16, 2011, 11:42:28 AM
ojii, that sounds like a useful feature to implement, maybe one of those low-hanging fruit you were speaking of? :)

On the off chance that days start to have more than 24 hours soon, I'll get on it :D

Gonna be busy all next weekend/week due to desertbus (check it out, awesome charity run for children! http://desertbus.org)

prissi

PLease, start simutrans with --help and you will see those options already:

If you load simutrans with the option "-until ###" it will run until month+(year*12)=### in fast forward.

If you use "-times", it will show the times needed to display images and some other stuff.


ojii

#17
Where does it show that info when using '-times'? It doesn't seem to do anything for me...

EDIT: Using DEBUG=3 and PROFILE=2 and OPTIMIZE=1

prissi

When DEBUG is set it shows those options on command line. It does for me.

Maybe try "make clean" before next compile.

Dwachs

ojii, can you try the attached patch? And see if starting simutrans with '-half_screen_flush' makes any difference in performance.

With this switch, every second screen flush (copying pixels from internal data to screen via SDL) only one fourth of the screen is copied. If this part of the code is the bottleneck, then the switch should give a performance improvement.

Next step would be to introduce a similar switch for rendering only part of the screen at times.
Parsley, sage, rosemary, and maggikraut.

prissi

I recently installed Ubuntu 11.10 and had a look at their UI. Since Unbuntu 11.10 it runs completely by the GPU. Could it be that this causes the slow-down?

In a quick check in a virtual machine the unity simutrans was slower than the XFCe X-window version.

VS

Someone posted in the previous thread that having a GPU overlay displayed at the same time slows Simutrans down. It's somewhat hard to test, since one has a browser opened all the time and some flash will just happen to be somewhere, and then the desktop is often accelerated too.

edit: I tried Dwachs' patch and it certainly helped scrolling :)

My projects... Tools for messing with Simutrans graphics. Graphic archive - templates and some other stuff for painters. Development logs for most recent information on what is going on. And of course pak128!

prissi

Just the results of -debug 3-sizes -times on different systems:

64 bit Ubuntu Xwindows in vmplayer:

Message: sizes: koord: 4
Message: sizes: koord3d: 5
Message: sizes: ribi_t::ribi: 1
Message: sizes: halthandle_t: 2

Message: sizes: ding_t: 16
Message: sizes: gebaeude_t: 48
Message: sizes: baum_t: 24
Message: sizes: weg_t: 40
Message: sizes: stadtauto_t: 88

Message: sizes: grund_t: 32
Message: sizes: boden_t: 32
Message: sizes: wasser_t: 32
Message: sizes: planquadrat_t: 24

Message: sizes: ware_t: 12
Message: sizes: vehikel_t: 120
Message: sizes: haltestelle_t: 960

Message: sizes: karte_t: 6920
Message: sizes: spieler_t: 4096

Message: test: display_img(): 300000 iterations took 161 ms
Message: test: display_color_img(): 300000 iterations took 168 ms
Message: test: display_color_img(): next AI: 300000 iterations took 162 ms
Message: test: display_color_img(), other AI: 300000 iterations took 158 ms
Message: test: display_flush_buffer(): 300 iterations took 37 ms
Message: test: display_text_proportional_len_clip(): 300000 iterations took 1610 ms
Message: test: display_fillbox_wh(): 300000 iterations took 18114 ms
Message: test: view->display(true): 200 iterations took 564 ms
Message: test: view->display(true) and flush: 200 iterations took 696 ms
Message: test: welt->sync_step/step(200,1,1): 200 iterations took 8001 ms


32 bit MSVC:

Message: Debug: size of structures
Message: sizes: koord: 4
Message: sizes: koord3d: 6
Message: sizes: ribi_t::ribi: 1
Message: sizes: halthandle_t: 2

Message: sizes: ding_t: 16
Message: sizes: gebaeude_t: 36
Message: sizes: baum_t: 20
Message: sizes: weg_t: 36
Message: sizes: stadtauto_t: 68

Message: sizes: grund_t: 24
Message: sizes: boden_t: 24
Message: sizes: wasser_t: 28
Message: sizes: planquadrat_t: 12

Message: sizes: ware_t: 12
Message: sizes: vehikel_t: 92
Message: sizes: haltestelle_t: 888

Message: sizes: karte_t: 6616
Message: sizes: spieler_t: 4048

Message: test: testing img ...
Message: test: display_img(): 300000 iterations took 125 ms
Message: test: display_color_img(): 300000 iterations took 142 ms
Message: test: display_color_img(): next AI: 300000 iterations took 142 ms
Message: test: display_color_img(), other AI: 300000 iterations took 140 ms
Message: test: display_flush_buffer(): 300 iterations took 9 ms
Message: test: display_text_proportional_len_clip(): 300000 iterations took 1235 ms
Message: test: display_fillbox_wh(): 300000 iterations took 6267 ms
Message: test: view->display(true): 200 iterations took 936 ms
Message: test: view->display(true) and flush: 200 iterations took 1000 ms
Message: test: welt->sync_step/step(200,1,1): 200 iterations took 2022 ms


32 bit GCC Mingw (3.4.5 optimized):

Message: Debug: size of structures
Message: sizes: koord: 4
Message: sizes: koord3d: 5
Message: sizes: ribi_t::ribi: 1
Message: sizes: halthandle_t: 2

Message: sizes: ding_t: 12
Message: sizes: gebaeude_t: 32
Message: sizes: baum_t: 16
Message: sizes: weg_t: 32
Message: sizes: stadtauto_t: 64

Message: sizes: grund_t: 20
Message: sizes: boden_t: 20
Message: sizes: wasser_t: 24
Message: sizes: planquadrat_t: 12

Message: sizes: ware_t: 12
Message: sizes: vehikel_t: 88
Message: sizes: haltestelle_t: 888

Message: sizes: karte_t: 6560
Message: sizes: spieler_t: 4040

Message: test: display_img(): 300000 iterations took 86 ms
Message: test: display_color_img(): 300000 iterations took 110 ms
Message: test: display_color_img(): next AI: 300000 iterations took 111 ms
Message: test: display_color_img(), other AI: 300000 iterations took 108 ms
Message: test: display_flush_buffer(): 300 iterations took 8 ms
Message: test: display_text_proportional_len_clip(): 300000 iterations took 1250 ms
Message: test: display_fillbox_wh(): 300000 iterations took 540 ms
Message: test: view->display(true): 200 iterations took 922 ms
Message: test: view->display(true) and flush: 200 iterations took 972 ms
Message: test: welt->sync_step/step(200,1,1): 200 iterations took 1624 ms


While in most routines the 64 code is only 10-20% slower (apart from fillboxes) the sync_step is tremendous slower. This may be of much importance, as this is where everything is moved. One of the reason might be that now all pointer are 64 bit and this twice the amount of data needs to run through the CPU and exceed cache sizes quickly. Deeper profiling is needed, but posting your values of "-times" might be very helpful.

Ters

A bit late, but I just tested 64-bit Simutrans on my 64-bit Linux machine with the following specifications:

Intel(R) Pentium(R) Dual  CPU  E2200  @ 2.20GHz
2 GB RAM
GeForce 7300 SE/7200 GS (nVidia driver)
Linux kernel 2.6.39
Screen resolution 1152x864
Compositing is on.

It has no problems running a game with a 1024x1024 map and probably about 1000 convoys at 25 fps with CPU time to spare, unless I zoom out several steps. This is with an executable compiled yesterday.

A year ago or maybe two, this machine had problems with a 256x256 map, but it suddenly got better. I suspect a driver update caused the increased performance. At least I remember suspecting that at the time, but I no longer remember exactly what I did at the time.