News:

Want to praise Simutrans?
Your feedback is important for us ;D.

Low FPS & CPU/GPU usage

Started by Ayasano, September 07, 2017, 11:34:02 AM

Previous topic - Next topic

0 Members and 2 Guests are viewing this topic.

Ayasano

Specs:

Windows 10
i7 6700k @ 4GHz
GTX 1060 6GB
16GB DDR4 RAM @ 1067MHz

FPS on an the standard 256x256 map is ~35, regardless of screensize or pakset. Dropping the map size to 8x8 doesn't change the FPS at all. Pausing drops the FPS to about 10 though, oddly enough.
No matter what the CPU usage never goes above 10% on any cores, and GPU usage is about 3-5%, even with the FPS drops during pausing.

Any idea what could be causing the game not to fully utilize the CPU/GPU and how to fix it?

Ters

I'm surprised you get that high FPS. I thought it was capped below that, at somewhere between 24 and 30. Drawing any more frames per second is pointless, except for the fastest planes.

Simutrans does not use GPU (except maybe DPI scaling). It's too old to be designed for that.

Ayasano

I changed the default FPS cap because it was strangely low. Moving the camera around with such low framerates gives me a headache because of the jittery motion. I'm more sensitive to it than most people I think, trying to play most games at 30 fps is pretty unpleasant, I always try to target at least 60 fps. I could cap it to 30 and try to just live with it, but I'd rather play at 60+. 10 fps during pausing is literally unplayable for me though.

DrSuperGood

QuoteI changed the default FPS cap because it was strangely low. Moving the camera around with such low framerates gives me a headache because of the jittery motion. I'm more sensitive to it than most people I think, trying to play most games at 30 fps is pretty unpleasant, I always try to target at least 60 fps. I could cap it to 30 and try to just live with it, but I'd rather play at 60+. 10 fps during pausing is literally unplayable for me though.
Moving the camera around causes the entire display to be redrawn every frame, instead of only the parts that change. In paksets with complex graphics, maps with a lot of activity or when zoomed out far enough this usually causes frame rate to become very low, single digit low.

CPUs just are not designed for composing graphics efficiently. As it is there are a lot of shortcuts being made to get the current performance including multi-threaded drawing and use of only 16 bits per pixel for improved fill rate. There is a reason software rasterizers went out of fashion and stopped being an option in most games well over a decade ago.

There is no quick fix for this either. Many years ago some developers looked into using OpenGL to do the rasterization but initial tests showed this to use more CPU time than rasterizing directly due to excessive driver overhead due to how drawing calls are made. APIs like Vulcan can probably be used to solve this due to considerably lower driver overhead, however seeing how Simutrans is largely stuck in early 2000s as far as required C++ standards go I doubt one can even include the Vulcan headers without causing various compilers to break due to unsupported language features.

Ters

The problem with Simutrans is that logic and display are tightly coupled. Increasing FPS would either cause the same image to be drawn twice, which doesn't make things smoother, or it would involve running the game logic more often than it is designed to. The latter most likely prohibits multiplayer, but I think it also might cause errors because slow vehicles would start moving less than a fundamental unit per frame.

What I notices most when panning around, is that some of the graphics contains alternativ dark and light pixels. My monitor, which admittedly is few years old now, seems to have some problems switching between them fast enough, causing a disturbing effect. I think this is what they call response time for displays. Modern games, as well as videos, usually have more photorealistic graphics where such abrupt color changes don't happen.

TurfIt

Quote from: Ayasano on September 07, 2017, 11:34:02 AM
Any idea what could be causing the game not to fully utilize the CPU/GPU and how to fix it?
Simutrans only uses enough CPU to achieve the target FPS as set in the simuconf.tab file. However there's a 40 fps limit applied even though values up to 100 can be input; And in practice only ~37 is achieved due to the faulty algorithm...

When paused, and running fast forward, the game is hardcoded to run 10 fps. Fast forward actually gets to the 10, but pause is stuck at ~8 (ughh!) due to the aforementioned algorithm...
I see no reason for pause to not target the normal fps.

Attached patch to allow the 100 fps (of course only getting to ~77-80), change the pause mode fps, and oscillate a little less.

Ters

I guess this makes Simutrans more responsive to events, which I remember being noticeably bad when paused, but does it really have any other benefits? Event induced panning probably gets smoother, which I failed to think of earlier. The scrolling message ticker also, perhaps. The rest is done in terms of simloops, not frames, or?

Ayasano

Quote from: TurfIt on September 09, 2017, 04:06:39 AM
Simutrans only uses enough CPU to achieve the target FPS as set in the simuconf.tab file. However there's a 40 fps limit applied even though values up to 100 can be input; And in practice only ~37 is achieved due to the faulty algorithm...

When paused, and running fast forward, the game is hardcoded to run 10 fps. Fast forward actually gets to the 10, but pause is stuck at ~8 (ughh!) due to the aforementioned algorithm...
I see no reason for pause to not target the normal fps.

Attached patch to allow the 100 fps (of course only getting to ~77-80), change the pause mode fps, and oscillate a little less.

Thanks, that worked. Bounces between 50 and 70fps unpaused, paused dips to about 40 if I zoom out, but I'll take that over 8 fps any day. :P (Also I've had enough of trying to get ancient libraries to compile for one day)

prissi

Ouse is to reduce CPU load. Only that the pause feature was more and more extended. It was never intended even to scroll during pause, when it was first introduced.

However, any movement in x-direction is neccessary limited to 2 pixel accurancy. That will be jittery no matter what frame rate. (The display will show 60 fps fixed [or whatever is its rate], since Simutrans just used the OS settings for this.)

In terms of the gme, when using fps>50 one should use more frames per step. (10 or so). At the moment this means like 20 game steps pre second, i.e. every fifth frame will be more as twice as long. However, at the moement this connot be configured, which is what fps was limited to 40.

DrSuperGood

QuoteHowever, any movement in x-direction is necessary limited to 2 pixel accuracy.
Why is this limit a necessity? Is it related to 4 byte alignment?

TurfIt

Quote from: prissi on September 09, 2017, 03:06:22 PM
Ouse is to reduce CPU load. Only that the pause feature was more and more extended. It was never intended even to scroll during pause, when it was first introduced.
GUI feel/response is quite bad at 8 fps now that one can scroll and build during pause. Any objection to this change to have pause run closer to the requested simuconf.tab framerate?


Quote from: prissi on September 09, 2017, 03:06:22 PM
In terms of the gme, when using fps>50 one should use more frames per step. (10 or so). At the moment this means like 20 game steps pre second, i.e. every fifth frame will be more as twice as long. However, at the moement this connot be configured, which is what fps was limited to 40.
I don't follow this at all.
In normal step mode (i.e. not multiplayer fixed step mode), the stepping frequency self adjusts to 5 per second independent of the frame rate.
At the default 25 fps, I see sync_step delta_t's of 40ms , and steps every 200 ms, with ~5 ss's per step.
At 30 fps, 33ms ss, 200ms step, and ~6 ss per step.
At 60 fps, sync_step is quite wonky oscillating sequences of 12 and 20ms, but steps are still every 200ms, and you get 9 to 16 ss for step.

Around 40 is where the frame timing control breaks down - it appears to be a positive feedback loop with some constant damping elements that try to counteract the positive feedback, but are easily overwhelmed so the thing constantly rings and I can even induce it to bounce from rail to rail every iteration.

Having a limit of 40 in one place, and a limit of 100 in another results in a horrible interaction. I'd suggest just allowing the 100 limit (even if it can't be achieved in practice due to other issues with the timing control), and strongly suggesting that 30 is the practical limit. It's still subjectively smoother at 60 vs 30, but not much IMO. And simutrans does have many shoot self in foot settings for players to enjoy!

Longer term, the frame rate should be decoupled from the simulation rate. GUI is just horrid at low rates, and the simulation doesn't need high rates (it even breaks down with loss of precisions around the 100 mark).

DrSuperGood

QuoteLonger term, the frame rate should be decoupled from the simulation rate. GUI is just horrid at low rates, and the simulation doesn't need high rates (it even breaks down with loss of precisions around the 100 mark).
Once one does this it is possible to smooth out convoy animation using interpolation.

One should also consider decoupling the event processor from the main game loop to prevent the application becoming unresponsive under circumstances like loading.

Ters

Quote from: TurfIt on September 09, 2017, 09:30:55 PM
GUI feel/response is quite bad at 8 fps now that one can scroll and build during pause. Any objection to this change to have pause run closer to the requested simuconf.tab framerate?

If avoiding CPU usage is an issue, it might be possible to only render after receiving any events. Polling for events is the only thing that needs to be done frequently. There would be CPU usage if the mouse moves over the window, but that is something the user can avoid.

Waiting for events would be more effective at lowering CPU usage, but perhaps not as easy to integrate into the code as it is.

prissi

The problem is that the only reliable pause "Sleep()" under Windows has between 50 ms and 2 ms resolution, depending on various things. Typically it is 5ms. With 5ms resolution the feedback becomes unstable for framerates higher than ~60 Hz, since for almost identical times the pause could be eithwe 5 ms or 0 ms but nothing in between. Not a problem in modern games, they do polling and waste CPU all the time.

One could in principle use the API another more precise API for this, and set the requested timer resolution:
https://randomascii.wordpress.com/2013/07/08/windows-timer-resolution-megawatts-wasted/
That is that Microsoft in the documentation of timeBeginPeriod advises against too high resolutions:
https://msdn.microsoft.com/en-us/library/windows/desktop/dd757624(v=vs.85).aspx

That is an ancient problem of windows, because under DOS it was absolutely no problem even in 1994 to get a reliable 50 microsecond waiting on a 286 using the directly accessible timer chip.

And yes, in principle in pause one only needs a redraw if there is an event and there was not a redraw with the last say 5 ms. Low priority for me, since I never use pause but rather fast forward a lot.

Ters

Quote from: prissi on September 10, 2017, 01:56:57 PM
The problem is that the only reliable pause "Sleep()" under Windows has between 50 ms and 2 ms resolution, depending on various things. Typically it is 5ms. With 5ms resolution the feedback becomes unstable for framerates higher than ~60 Hz, since for almost identical times the pause could be eithwe 5 ms or 0 ms but nothing in between. Not a problem in modern games, they do polling and waste CPU all the time.

I get the impression that those times are the time slices given by the OS scheduler. On a preemptive OS, such time can be lost at any time under any circumstance.

TurfIt

Quote from: prissi on September 10, 2017, 01:56:57 PM
One could in principle use the API another more precise API for this, and set the requested timer resolution:
The GDI backend already does this! (and it's inherent with SDL2 - didn't try SDL1 but I'm sure it does too...) The most I'm seeing is a 2ms delay from the requested on the simutrans task getting going again.
But this did point my memory back to another suspicious piece of Simutrans code...  sleep(9) ??? No wonder things don't work! It's sleeping based on the 200ms step timing rather than the sync_step/framing needs.

Revised proposed patch - sets sleeping to 3. Is good for ~75 fps before it starts missing the target, and only slightly now. Could set to sleep 2 (or 1), to get full 100 fps support, but it's not bad at 3 IMHO.
Waking up 3 times more often with 3 vs 9 is not a huge cpu hit - even every 1 ms would be perfectly fine on any modern system IMO.

DrSuperGood

#16
Another approach would be to time updates to vertical sync steps. That is how most modern games do it I think. Every vertical sync step you poll the time and see if a game state update is needed and if so schedule it.

Otherwise a different update algorithm would be needed, one that preserves time error and can potentially iterate multiple times without waiting if larger corrections are needed. At the start of each frame update record the time, subtract previous frame update time from it and add this delta value to an accumulator. If the accumulator is smaller than a game frame period then sleep the different between the accumulator and the frame update period (so next frame update the accumulator will be at least a game update period).  If the accumulator exceeds a maximum value at the start of each frame update (real time requirements are being violated, the required workload is impossible to schedule) then slow down the game update rate until it drops below the threshold, possibly display a warning to the user about insufficient system resources. While the accumulator is greater than game frame period, perform a game update loop (1 fame) and then decrement the accumulator by the game update period.

This should be completely immune to sleep timer resolution error as it is self correcting within reasonable limits. There might be problem with frame rate stability due to the timer resolution error, however that should only be apparent at very high frame rates and even then it will still average the correct number of frames per second. The main source of timing inaccuracy would be the time measurement itself. Additionally time measurement might add some overhead seeing how they are OS calls, however I would imagine that at most once per frame is trivial.

Ters

I don't think modern games sync their logic to vsync. That was the way it was done when Simutrans was born. Vsync is for timing the copying of the backbuffer to the frontbuffer in order to avoid tearing. Furthermore, by happening on the GPU, rendering is more or less asynchronous to the logic.

As far as I can tell, there is no sleep timer. All sleep does is tell the scheduler not to reschedule the thread for x milliseconds. Since the scheduler runs at fixed (but configurable) intervals, there might not be exactly x milliseconds until the scheduler is next run. That is the case no matter how you wait, except by busy-waiting, but that doesn't conserve CPU at all. Sleep is fine for yielding time when you have nothing to do, but should not be used for exact timing.

Quote from: DrSuperGood on September 10, 2017, 08:23:57 PM
At the start of each frame update record the time, subtract previous frame update time from it and add this delta value to an accumulator. If the accumulator is smaller than a game frame period then sleep the different between the accumulator and the frame update period (so next frame update the accumulator will be at least a game update period).  If the accumulator exceeds a maximum value at the start of each frame update (real time requirements are being violated, the required workload is impossible to schedule) then slow down the game update rate until it drops below the threshold, possibly display a warning to the user about insufficient system resources. While the accumulator is greater than game frame period, perform a game update loop (1 fame) and then decrement the accumulator by the game update period.

That is, or at least used to be, the rule for how games time logic. I even thought that was how Simutrans does it, except that it dropped rendering frames rather than reduce update frequency.

prissi

I tried sleep very much in 2005, and any sleep equal to 5 or shortwe ends up to not sleep at all in every second case. It looked like sleeps was always rounded to the next 5 ms. It may have been fixed in between.

Is CPU usage different with Sleep(3) and Sleep(9)?

Ters

Quote from: prissi on September 12, 2017, 02:22:26 PM
Is CPU usage different with Sleep(3) and Sleep(9)?

That depends on a whole lot of stuff. Sleep is not an alarm clock. It is not a way to tell the operating system to wake you up in x amount of time, it is a way of telling the OS that you have nothing to do for at least x amount of time. The OS can then find some other thread to run, or put the CPU (core) to sleep until some external even wakes it up again (this would be an interrupt back in the single core days; I'm not sure how multicores work). On a preemptively multitasked OS, a thread doesn't really have any guarantee when it will be allowed to run again, irrespective of whether it yielded voluntarily, such as with Sleep, or got preempted by force.

TurfIt

Quote from: prissi on September 12, 2017, 02:22:26 PM
I tried sleep very much in 2005, and any sleep equal to 5 or shortwe ends up to not sleep at all in every second case. It looked like sleeps was always rounded to the next 5 ms. It may have been fixed in between.
Follow through the blogs you posted above, and they state significant changes were made Win7+.


Quote from: prissi on September 12, 2017, 02:22:26 PM
Is CPU usage different with Sleep(3) and Sleep(9)?
Slightly. With sleep(9), Windows reports 3-4% usage, 1.00GHz on the CPU. With sleep(3), I get 4% and 1.12. With sleep(1), 4-5% and 1.20.  At idle (Simutans not running), 2% and 1.28. Gotta love all the background crap...
That's at default zoom. What's interesting is the other end - full zoom out. Sleep(9), 20%, 4.56GHz. Sleep(3) - 17%, 3.80. Sleep(9) is long enough the frame timing fights itself.

Simutrans itself does rather little in this loop when continually sleeping, but it pumps the windows event queue everytime which is not ideal...
Rather the entire frame timing and interactive loops are completely bassackwards. The high frequency framing should be running things determining the sleeps, not the low freq stuff with the high then trying to jam itself in. But fixing that is beyond today.

Latest iteration of patch attached. More precision in fps calc steadys things a bit more for certain target fps's, can get rid of the extra deadband I'd added.

MirceaKitsune

Should have looked more carefully and seen this... made a double thread about it earlier. I too would like to see the FPS cap lifted, and be able to enjoy Simutrans at a decent 60 FPS! Unfortunately I understand there are engine limitations involved, and the code rewrites might go beyond what the developers are able and willing to do.

How did you raise the FPS cap though? I read that in one of the replies, but I'm not seeing any option for that (the FPS menu setting goes up to 25 max). I'm curious how bad the breakage can really be if I do it. Other than that, I hope the patches discussed here will make it into the engine and at least improve the problem somewhat, my thanks to those who have worked on them and can't wait to try them in a future update :)

TurfIt

Set the desired framerate in simuconf.tab.
The settings menu having a 25 limit is yet another oversight...
The patch here could still use some refinement.

MirceaKitsune

Quote from: TurfIt on September 25, 2017, 02:11:48 AM
Set the desired framerate in simuconf.tab.
The settings menu having a 25 limit is yet another oversight...
The patch here could still use some refinement.

I couldn't find a simuconf.tab in my user settings directory. Just a settings.xml, in which I tried finding and editing the value but to no avail. I'm using the version installed by my distribution via the system packages, though if it's easy to compile I should probably checkout from Git at this point.

Ters

simuconf.tab is in subdirectories of the program directory. There is one basic, and one specific to each pak set. This means that it might require administrator privileges to modify it, depending on how Simutrans was installed.

jamespetts

I am only discovering this thread now - did I miss this on the Standard Github repositories, or has this not been implemented yet? Is this working well enough to incorporate the latest patch on this thread into Extended without much difficulty?
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

TurfIt

Not yet complete, latest here is r3, I have r7, but haven't looked at in 2 months....  I think I was still not happy with the pause behaviour.

prissi

Well, since the patch behaves better than vanilla simutrans, I incorporated version r3 in r8570. I would be happy to use r7 too ...

TurfIt

Barely better since v3 didn't address the root cause which v7 did. It would be appreciated to give me a nudge rather than committing such work in progress patches yourself - somewhat of a PITA to undo v3 to apply v7 instead. I still didn't get around to solving the pause mode behaviour, but atleast v7 was working properly for normal and fast forward.

prissi

So please commit nubmer 7 then, because I wanted to release in a few weeks, to solve the dualstack server problem properly (since now many users only get real IP6 numbers) and I just looked at what patches looked ready to include. ANy the 9ms wait was clearly wrong.