The International Simutrans Forum

 

Author Topic: Performance analysis  (Read 1431 times)

0 Members and 1 Guest are viewing this topic.

Offline jamespetts

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 20274
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Performance analysis
« on: July 22, 2020, 12:34:08 PM »
I notice that some people on the server were discussing performance issues. A brief profiling run gives a breakdown of where the most computational effort is being spent (bear in mind that multi-threading in some but not all parts of the code may distort how this affects real world performance).
Some highlights from the findings (using the current Bridgewater-Brunel saved game in single player mode; performance may be subtly different in multi-player mode) are:
  • graphics (main_view_t::display_region), which are multi-threaded, take a total of 31.79% of CPU time;
  • passenger and mail generation (step_passengers_and_mail_threaded_ takes a total of 16.82% of CPU time, of which 10.01% (overall CPU time, not 10.01% of 16.82%) consists of finding a path between halts by querying the hashtable (haltestelle_t::find_route);
  • vehicle physics calculations (convoi_t::calc_acceleration) takes 10.89% of CPU time;
  • the main path explorer method, which runs in its own thread, (path_explorer_t::compartment_t::get_path_between) takes 5.04% of CPU time;
  • pedestrians (pedestrian_t::sync_step) take 1.70% of CPU time;
  • convoy routing (including ships), which is multi-threaded, (convoi_t::drive_to) takes 1.67% of CPU time;
  • convoy loading at stops (haltestelle_t::request_loading) takes 1.14% of CPU time
These results are with my computer on this particular saved game; different results with different saved games are likely.

Offline Freahk

  • Devotee
  • *
  • Posts: 1336
  • Languages: DE, EN
Re: Performance analysis
« Reply #1 on: July 22, 2020, 12:52:39 PM »
Good to know, but I am still quite sure CPU load by itself is not the main issue here.
Most of the time, none of my cores are even close to 50% computational load.
I do rather expect the the bottleneck to be located elsewhere. E.g. memory bandwith, pipeline stall or control hazzards, though i did not find out how to profile this properly yet.

Offline kierongreen

  • Dev Team, Coder/patcher
  • Devotee
  • *
  • Posts: 2346
Re: Performance analysis
« Reply #2 on: July 22, 2020, 12:55:07 PM »
Good to know, but I am still quite sure CPU load by itself is not the main issue here.
Most of the time, none of my cores are even close to 50% computational load.
I do rather expect the the bottleneck to be located elsewhere. E.g. memory bandwith, pipeline stall or control hazzards, though i did not find out how to profile this properly yet.
Memory bandwidth issues (including issues relating to processor cache size) can be identified if performance increases from more threads tails off. It's very dependent on the system being used though.

Offline jamespetts

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 20274
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Performance analysis
« Reply #3 on: July 22, 2020, 12:56:00 PM »
Good to know, but I am still quite sure CPU load by itself is not the main issue here.
Most of the time, none of my cores are even close to 50% computational load.
I do rather expect the the bottleneck to be located elsewhere. E.g. memory bandwith, pipeline stall or control hazzards, though i did not find out how to profile this properly yet.

I suspect that memory bandwidth is a major issue, especially for graphics and the routing algorithms for passengers/mail/goods.

As to profiling, are you using Visual Studio? If so, it is quite straightforward: use the optimised debug build and go to Debug > Performance Profiler. In GCC, it is also fairly straightforward to profile, although I cannot remember the exact steps.

Edit: Disabling water animation can significantly reduce graphics load on the system.

Offline Freahk

  • Devotee
  • *
  • Posts: 1336
  • Languages: DE, EN
Re: Performance analysis
« Reply #4 on: July 22, 2020, 03:12:10 PM »
Quote
Edit: Disabling water animation can significantly reduce graphics load on the system.
I'll try this offline, if so it would be very kind of you to disable this on bridgewater.

I'm using clion. Visual studio under Linux is a mess, just like any microsoft product. I could install windows, but I don't agree with that company in very many points, so I prefer not using their products whenever possible.

Anyways, clion has a lot of debugging and profiling features either, I'll find it out...

Offline jamespetts

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 20274
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Performance analysis
« Reply #5 on: July 22, 2020, 05:16:43 PM »
For Linux, I recommend GCC for profiling; it does have some fairly standard ways of doing this. I am not familiar with Clion - is that a compiler or an IDE?

So far as water animation is concerned, you can disable this locally for online games without disabling it on the server: simply set water_animation_ms=0 in the base simuconf.tab. This is one of the settings that is not synchronised between client and server.

Offline Freahk

  • Devotee
  • *
  • Posts: 1336
  • Languages: DE, EN
Re: Performance analysis
« Reply #6 on: July 22, 2020, 07:51:54 PM »
Clion is an IDE using cmake, gcc (or clang if you choose to), gdb and all these standard gnu tools under the hood.

Offline jamespetts

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 20274
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Performance analysis
« Reply #7 on: July 22, 2020, 08:29:29 PM »
Clion is an IDE using cmake, gcc (or clang if you choose to), gdb and all these standard gnu tools under the hood.


Splendid - you should be able to use the GDB profiler with that, in that case.

Offline Matthew

  • *
  • Posts: 420
    • Japan Railway Journal
  • Languages: EN, some ZH, DE & SQ
Re: Performance analysis
« Reply #8 on: August 08, 2020, 04:36:22 PM »
From the markers,  etc. thread:

Quote from: jamespetts
Thank you for your thoughts on this. We should really move discussion of the performance to the thread specific to performance; would you mind posting your results in that thread so that I can gauge better whether to reduce the framerate and also get feedback on this from other players?

I have not sure exactly what data you want here, so here is a first attempt.

I have continued trying to play around with settings to understand the UI lag. When playing Bridgewater-Brunel off-line in 1845, the in-game Display GUI now indicates a Frame Time of ~30-40ms and 10-18 fps normally. This goes down to ~60ms and 8-10fps when the Path Explorer is processing passenger classes 3 and 4 (the goods classes are no trouble). The Idle value is always 0ms (except when it jumps to a very high figure that I guess might be due to a an integer wraparound), but I don't know how significant that is in the days of multicore CPUs.

The fact that the performance is worst when the Path Explorer runs certain passenger classes makes me wonder whether it might be worth reducing path_explorer_time_midpoint in preference to hurting other players' graphical performance. I tried setting it to 48 and loaded the B-B savefile offline: Frame Time did not seem to be affected, but FPS went up to about 22 normally; during heavy passenger classes the values were ~42ms and 15-20fps; I also got sporadic idle time.

As a side note, that fact seems to support Freahk's idea that the issue is memory bandwidth, since there seems to be plenty of spare capacity in the CPUs and RAM. I accept that the Path Explorer is inevitably going to be computationally demanding in a game played on a large map and that nobody can alter the Path Explorer in the near future; just noting the point for future reference.


Offline jamespetts

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 20274
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Performance analysis
« Reply #9 on: August 08, 2020, 06:21:28 PM »
Thank you for the feedback - that is helpful. It is better to change one parameter at a time for testing. I have modified the framerate to reduce to 15fps on the server, and I should be grateful for feedback (from to-morrow when this change will take effect) as to how this affects lag.

Offline prissi

  • Developer
  • Administrator
  • *
  • Posts: 10243
  • Languages: De,EN,JP
Re: Performance analysis
« Reply #10 on: August 10, 2020, 01:07:48 AM »
The CPU load is not really a good indications, since Simutrans have many places, which must be processed in certain order to prevent desync. Thus involves neccessarily waiting, even with multithreading and thus the CPU load will be usually even leass than 50%, depending of the number of thread available.

Unless the Microsoft profiler did made tremendous progress, in my experience the numbers are rather off, since it count any library or OS call as as part of the instrumented code. One of the biggest number for a threaded server should be the percentage of waiting for barriers. It is not time spent in the code, but if most real time is wasted at barriers, then total time increases with multithreading despite profiling tells you otherwise.

In this regard, best results on total time are probably from profiling single and multithreaded and compare.

Offline Matthew

  • *
  • Posts: 420
    • Japan Railway Journal
  • Languages: EN, some ZH, DE & SQ
Re: Performance analysis
« Reply #11 on: August 27, 2020, 12:48:04 AM »
Thank you for the feedback - that is helpful. It is better to change one parameter at a time for testing. I have modified the framerate to reduce to 15fps on the server, and I should be grateful for feedback (from to-morrow when this change will take effect) as to how this affects lag.

James, here is some feedback, which is a mixture of good and bad news.

The good news is that something dramatically decreased UI lag for a while. From around 8th August it became trivial, <10s, often much less, and I was able to play the game full-screen (at 1080P) and to use the map as well.

The first piece of bad news though is I can't be certain that this was entirely due to the framerate change because that same weekend my VM provider very slightly upgraded my CPU. It was a miniscule upgrade (just one rung on the Tom's Hardware ladder), but the new CPU (Ryzen 5 1400) has a 50% larger L3 cache, so that might be significant if memory bandwidth is the problem.

The second piece of bad news is that UI lag returned with a vengeance yesterday (before the nightly update of 26th August GMT). In the GMT afternoon of 25th August, I had no or trivial lag, but that evening it was terrible again, at least a minute. Tonight, even immediately after a resynch, tree planting/deletion took 38 seconds. Just processing password entry incurs a noticeable lag. After 1.5 in-game hours this increases to 2m30s to 3m or so.

Yet just now, at 0000 GMT, I was back to no UI lag again even with full-screen 1080P. I noticed that there didn't seem to be anyone else connected, so I closed the client, opened a fresh instance of the client and again had trivial lag (even though the Path Explorer was processing passenger class 3). Only you (James) were connected at the time.

Why has UI lag suddenly accelerated? Some possibilities:
  • The game has gradually been increasing in complexity within my VM's performance envelope and yesterday it exceeded one of those limits. If so, my highly speculative guess is that the Path Explorer has again exceeded some threshold, causing cache thrashing. But this does not fit with the fact that lag seems to vary depending on who else is playing.
  • It's caused by the actions of other (non-Simutrans) users of the same cloud service. I guess this is possible, but at no point am I short of Internet bandwidth (savegames load at 100Mbps) and the provider's whole USP is that we rent dedicated systems, not shared servers. So I should not be sharing CPU time or RAM with anyone else. The SSD/HDD storage must be shared (presumably on some RAID-like system), but that seems irrelevant.
  • It's perhaps possible that the framerate or some other setting was accidentally changed during the changes you and ceeac have been making to server settings. But then why is everything fine right now?
  • I have been playing at the same time as other players yesterday and today, whereas last week I was often playing alone. But tonight I had no lag when you (James) were online. So it's something to do with the interaction of multiple players on the server. But it's also possible that this is somehow bound up with the Path Explorer, since I guess that active players are more likely to trigger changes to the routing table.
  • Maybe your changes to industry code have something to do with it, but I doubt it.

I know that yesterday the terrible lag only began after freddyhayward connected. (Sorry Freddy! This is obviously not something you intended or can control and I know that your struggles with slow Anternet are at least as frustrating as my struggles with UI lag!)

Possible relevant factors:
  • Freddy uses the Linux client and I use the Windows client. Supporting this is the fact that I have no lag when playing alongside James, who also uses the Windows client. But I cannot explain why this would be. I would like to test with combinations of players+clients to see if that supports/refutes this theory. It may also be relevant that I had little lag last week when Freahk, who also uses the Linux client, was away.
  • Freddy's slow Internet means that my client is paused while his finishes downloading. If anything, that ought to give me a head start with processing though.
  • Freddy's poor Internet means that he often has to resynch. Possibly the cumulative resynchs are causing memory corruption or something else that slows my client's performance.
  • James connecting somehow slows down the server so that I can keep up. James, were you running rebuilds on the server whilst playing last night, or something like that? But that would not explain the low lag I had last week when playing alone.

Sorry, that's a lot of hypotheses but I have had plenty of typing time while timing tree deletions tonight.  ::'(

P.S. BTW the lag persists even when I disconnect and continue running the network game offline. Only closing the client and restarting a fresh instance stops it. But older forum posts suggest that this effect is already well-known to you.
« Last Edit: August 27, 2020, 01:06:35 AM by Matthew »

Offline Ranran

  • Devotee
  • *
  • Posts: 1226
  • Languages: ja
Re: Performance analysis
« Reply #12 on: August 27, 2020, 01:21:43 AM »
I think at least the time it takes to connect to Bridgewater server (download data) is not due to internet speed.

Offline freddyhayward

  • Devotee
  • *
  • Posts: 437
  • Languages: EN
Re: Performance analysis
« Reply #13 on: August 27, 2020, 01:23:00 AM »
In general, my anecdotal experience since the latest round of builds is that there are more desyncs when linux and windows users are on at the same time, but very few when only linux users are online. Hopefully ceeac's fixes were relevant to this.

Offline freddyhayward

  • Devotee
  • *
  • Posts: 437
  • Languages: EN
Re: Performance analysis
« Reply #14 on: August 27, 2020, 04:42:39 AM »
Freddy's poor Internet means that he often has to resynch. Possibly the cumulative resynchs are causing memory corruption or something else that slows my client's performance.
Actually, my internet speed shouldn't have any impact on this at all, since the amount of data transferred after downloading the save is trivial.

Offline jamespetts

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 20274
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Performance analysis
« Reply #15 on: August 27, 2020, 10:17:26 AM »
Matthew, thank you for your feedback. First of all I should note that my attempt to reduce the framerate did not work because this did not end up overriding the saved game settings. This has worked in the past: I suspect that the simuconf.tab override code has changed since then. This would mean that I would need to have the simuconf.tab settings overriding all the saved game settings in order to change the framerate, which would be undesirable, as there are many settings in the base simuconf.tab that should not override pakset settings. I had the same problem with the industry settings originally, but implemented a specific always on override for this, so only in the last day or two have we had the industry density proportion override as intended.

The consequence of all this is that we have been running at 30fps throughout. When I connect to the server with my somewhat fast computer, the behaviour that I get is that it is not very smooth: every second or so it will run quickly for a fraction of a second then pause for a fraction of a second. This suggests that the server is slower than the my client, which is constantly having to slow down to allow the server to catch up. I am not getting any noticeable input lag.

It would be interesting to know what other people's experiences of input lag are. The wild fluctuations reported by Matthew do not seem consistent with anything happening in the game; unless anyone else reports similar fluctuations, I suspect that these will be related to Matthew's VM, but in what way I cannot tell. One thing that Matthew could do is look at the statistics in the "display" dialogue and check frame time when things are running well as well as when things are running poorly and report these. If others could also report their frame times at the same time, this would be helpful. We could then see whether there is a similar relative fluctuation in frame times (even if the absolute values on others' frame times mean that they do not get lag even at the worst values), or whether the relative fluctuations are only on Matthew's VM, which would mean that the problem is likely to be with the VM.

As to the losses of synchronisation, I should be grateful if people could discuss that on the dedicated thread, since I will lose track of any discussion on that issue that takes place here.

I should note that it is inherently very unlikely that a different platform client being connected to the server at the same time could affect the performance of another client. The data transmitted between client and server while running is minimal (limited to the broadcast of actions by other players and any commands from the server to the client to pause while the server catches up), and the performance will depend on how the code is executed locally on the server. Matthew might want to check the framerate of the game in single player mode on his VM to see whether the framerate (which varies in single player mode) varies with time in the same way that the lag varies. If the single player framerate drops <30, then this indicates a condition in which lag is likely to occur if connected to the server.

In the meantime, I will look at implementing universal server override for framerates, as it does not make any sense for the saved game value to override the simuconf.tab value for framerates on a server.

Offline Freahk

  • Devotee
  • *
  • Posts: 1336
  • Languages: DE, EN
Re: Performance analysis
« Reply #16 on: August 27, 2020, 10:32:44 AM »
This would mean that I would need to have the simuconf.tab settings overriding all the saved game settings in order to change the framerate, which would be undesirable
I do usually use the simuconf in users directory for that purpose.
The alternative of downloading the map, using the edit tool, aving it and uploading it again will also do the job, but somehow passwords will be lost in that process.

I do not have any performance troubles of that kind at the moment, apart from end-of-month lags, which is pretty usual.

I just took some time to make a few observations:
Sometimes, after a desync the game is extremely laggy. It's reporting something around 5 fps.
Pausing and unpausing the game will make it run much faster again.

Loading and saving the game seems to reduce fps in some cases.
After the pause/unpause thing, I ended up at something around 22 fps in singeplayer.
Saving and loading the game ended up in 17 fps. Pause/unpause did not help here, apart from a rather short 30 fps spike.
The CPU load in these cases was roughly the same. Nothing special could be observed in htop ("task manager")
Notice that the path explorer was running in those cases.
On the server, I get a completely different fps behavior. It is extremely jumpy. Everything between 10 fps and 300 fps was observed.
The UI is a little umpy as well, but I did not fall behind. That means the dragging preview of ways does soethimes not work at all, but when I release the mouse button, the way is built immediately.
« Last Edit: August 27, 2020, 11:01:25 AM by Freahk »

Offline jamespetts

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 20274
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Performance analysis
« Reply #17 on: August 27, 2020, 11:23:54 AM »
Thank you - I have been testing the FPS issue. It seems that I was mistaken in my initial post and that the existing code already provides for FPS overriding for servers, and the framerate of 15 has been correctly set on the server; however, because the framerate is inconsistent for some reason, it is not observed as being 15 in the display dialogue. I am not sure why the framerate is inconsistent, as I do not have knowledge of the code controlling this.

Offline Matthew

  • *
  • Posts: 420
    • Japan Railway Journal
  • Languages: EN, some ZH, DE & SQ
Re: Performance analysis
« Reply #18 on: August 27, 2020, 04:48:45 PM »
Sorry, I wrote "poor Internet" and "slow Internet" in the wrong lines, which was confusing.

I think at least the time it takes to connect to Bridgewater server (download data) is not due to internet speed.

My impression is that when Freddy and Huitsi connect, the "pause" period lasts longer than when other players connect. I will try to measure the time to confirm this, but it's a little difficult because I don't know when other people will connect.

I interpreted the longer pause period as Freddy and Huitsi needing a longer period than others to download the server save game because they have less Internet bandwidth.

Actually, my internet speed shouldn't have any impact on this at all, since the amount of data transferred after downloading the save is trivial.

You are quite right, of course.

It would be interesting to know what other people's experiences of input lag are. The wild fluctuations reported by Matthew do not seem consistent with anything happening in the game; unless anyone else reports similar fluctuations, I suspect that these will be related to Matthew's VM, but in what way I cannot tell. One thing that Matthew could do is look at the statistics in the "display" dialogue and check frame time when things are running well as well as when things are running poorly and report these. If others could also report their frame times at the same time, this would be helpful. We could then see whether there is a similar relative fluctuation in frame times (even if the absolute values on others' frame times mean that they do not get lag even at the worst values), or whether the relative fluctuations are only on Matthew's VM, which would mean that the problem is likely to be with the VM.

I only see a fixed frame time of 66 ms, so I am guessing that by "frame time" you mean "fps", right? Just now, with 3-5 other players connected and active, I get 14-16fps, 1.5-1.6 Simloops, 100 idle time. But it periodically spikes to 300-400fps, which I think correlates with the Path Explorer starting a new class. During the "pause" period when people join, I get 9 or 10 fps.

I will report further when I'm able to connect alone.

Quote
Matthew might want to check the framerate of the game in single player mode on his VM to see whether the framerate (which varies in single player mode) varies with time in the same way that the lag varies. If the single player framerate drops <30, then this indicates a condition in which lag is likely to occur if connected to the server.

Thank you, I will test this soon.

Offline Matthew

  • *
  • Posts: 420
    • Japan Railway Journal
  • Languages: EN, some ZH, DE & SQ
Re: Performance analysis
« Reply #19 on: August 27, 2020, 11:09:17 PM »
Matthew might want to check the framerate of the game in single player mode on his VM to see whether the framerate (which varies in single player mode) varies with time in the same way that the lag varies. If the single player framerate drops <30, then this indicates a condition in which lag is likely to occur if connected to the server.

I have now collected this data:



If that is too small to read, then the spreadsheet is here.

The "range" figures are minima and maxima; the "typical" figures are my subjective impression of the mode. But my eyes could not be everywhere at once, so they are inevitably inaccurate.

"Accelerating" is used when the trajectory of the frame time is used when the figures seemed to increase exponentially until maxing out at 246 (when the Path Explorer finished) or 250 (where the frame time permanently stuck).

The thread %CPU figures are from Process Hacker. This program's outputs are not well documented, but it appears to be the percentage of CPU time used by each of the game's threads. I always noted the figures for the main thread; the "highest thread" rows are used only when another thread was using more CPU than the main thread. When this happened, there were often four such threads at very similar levels.

The "fill path matrix" and "explore paths" rows are probably inaccurate. I only noticed near the end of these observations that the figures seemed to carry over when I loaded a new savegame, which makes me wonder whether they are only updated each time the Path Explorer finishes a goods type.

I am still running the May 1863 game offline. After half a month it has entered its second PE cycle. I will update if it the PE ever returns to standby and I get appreciably different results.

BTW other players reported on Discord that the current network maxes out their CPU. My CPU was certainly very busy, but not maxed out. That would seem to fit Freahk's theory that memory bandwidth is the obstacle.

In conclusion, it appears that this VM has rarely met the standard James has set. So it seems that I may soon be departing from Bridgewater-Brunel. But it is puzzling that I had long periods last week without desynchs or more than trivial UI lag. It might also be significant that none of the Path Explorer on Standby values seem to change in a simple linear way, as I had expected. But even if the problem is narrowed down to the Path Explorer, I realize that a large game is inevitably going to have a heavy load of route-finding computations and that changing the Path Explorer would be a very major project that is not currently on the development roadmap.

Offline TurfIt

  • Dev Team, Coder/patcher
  • Devotee
  • *
  • Posts: 1395
Re: Performance analysis
« Reply #20 on: September 03, 2020, 05:56:41 PM »
Sorry, I wrote "poor Internet" and "slow Internet" in the wrong lines, which was confusing.
My impression is that when Freddy and Huitsi connect, the "pause" period lasts longer than when other players connect. I will try to measure the time to confirm this, but it's a little difficult because I don't know when other people will connect.
I interpreted the longer pause period as Freddy and Huitsi needing a longer period than others to download the server save game because they have less Internet bandwidth.
The pause period is the longer of the time for the server save, file transfer, server reload, or for all existing clients to save and reload. If a player with a slow computer is connected, they could cause everybody else to wait around. (Should really put a limit on that...)
Note, the game does not wait for the joining player to load. If they have a slow computer (load the game slower than the server), they'll be left behind with input lag right off the start.


Matthew, thank you for your feedback. First of all I should note that my attempt to reduce the framerate did not work because this did not end up overriding the saved game settings.
framerate is not a savegame setting - env_t vs settings_t.  Just change in simuconf.tab and done.  Note however framerate, server_frames_ahead, server_frames_per_step, and server_frames_between_checks are all linked and need adjusting together. (and additional_client_frames_behind, but that's on each client to adjust [And not as critical since Dr.SuperGoods barrier was added to stop the run ahead desyncs].) Ideally these would be set in ms rather than frames, but...

"Standard" default network settings were 10fps, 4 frames_per step. That's a step every 400ms (or 2.5 simloops). Offline the target is 5 simloops (200ms). You don't want less than 2 simloops else responsiveness will suffer. And you don't want more than 5 either - needless CPU load. I hope you weren't running 30 fps and 4 frames_per_step - that's a step every 130ms, nobody would be able to keep up (nor the server...).


, the behaviour that I get is that it is not very smooth: every second or so it will run quickly for a fraction of a second then pause for a fraction of a second. This suggests that the server is slower than the my client, which is constantly having to slow down to allow the server to catch up. I am not getting any noticeable input lag.
Check the server logs for "server lagging by" warnings. If many, you need to tone down the settings/game, or get a faster server. Clients attempt to adjust their rate to the servers, but can only do so much. (And if server_frames_between_checks==1 for desync debugging, will be timing jerked all over the place. [should be set for a check every 10 secs or so in normal use])

Also, Extended is relatively heavy in processing within step(). Standard is much more balanced sync_step() vs step(). This means everytime a step a runs, the game falls behind it's frame target (a step can take multiple frametimes to complete if too high a fps/too slow a server), and sprints out back to back frames to catch up. The client sees a pause, followed by a high framerate, before normal frame pacing, repeat the cycle. Very nauseating.

For best experience, you need to ensure step() completes before it's time for the next sync_step(). I typically see sync_step()/display() taking 10-40ms (depends on screen resolution, zoom factor, cpu speed, etc.), so not much time for step() if trying to run 15fps (66ms time interval). Before I stopped contributing, I had investigated breaking step() into chunks so a bit could process every frame(). It's entirely doable, and quite simple really, but I never finished (and work too old to not need starting anew). Something to consider....


It would be interesting to know what other people's experiences of input lag are.
Input lag in the order of tens of seconds/minutes is entirely due to ones computer not keeping up. Run '-debug 3 -log' and check for "time difference to server" messages. If that number is continually increasing, you're not keeping up. (even better run with a .exe complied with the console enabled, and you can watch in realtime...)



I should note that it is inherently very unlikely that a different platform client being connected to the server at the same time could affect the performance of another client.
True for Standard, not Extended which has the Path Explorer (PE).  The PE adjusts the amount of work it does each step() to minimize the time to completion, but not unduly take too much time in each step(). These iteration limits are sent to the server, which then commands all clients to use the limits from the slowest computer. So if performance improves after a particular client connects, that client has the slowest computer (and likely is getting a rather poor experience.)  Or, something has gone wrong in the limit calculations/transmission.   I'm not seeing where nwc_routesearch_t::check_for_transmission() is being called in the current code....
IIRC the server logs should show the limits received from each client, and the set to minimum command going out. I'd suggest checking this limit function is still working.

EDIT:
Code: [Select]
// Knightly : check if changed limits, if any, have to be transmitted to all clients
if(umgebung_t::server)
{
nwc_routesearch_t::check_for_transmission( this );
}
has gone missing from the code.... was in karte_t::interactive() which was refactored. Likely a massive merging from Standard mistake.
I believe it belongs in karte_t::process_network_commands() now,  around simworld.cc::10716.

EDITEDIT:
The other piece of limit code also went missing - see 759ef8ccc39b3fe895e838301866e63f869eb44d.
« Last Edit: September 03, 2020, 10:28:01 PM by TurfIt »

Offline jamespetts

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 20274
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Performance analysis
« Reply #21 on: September 05, 2020, 01:21:49 PM »
TurfIt - thank you for that. I have reinstated the missing code quoted above from Knightly. I am slightly less clear on what in the commit with hash 759ef8ccc39b3fe895e838301866e63f869eb44d is now missing. I see that the process network command section is still present, albeit updated. Can you elaborate on what in that commit needs to be re-added? Thank you again.

Offline TurfIt

  • Dev Team, Coder/patcher
  • Devotee
  • *
  • Posts: 1395
Re: Performance analysis
« Reply #22 on: September 05, 2020, 10:55:46 PM »
Clearly GIT still eludes me.... That commit number looks like the commit into standard, not the merge into Extended despite that's what I was looking at.
Maybe: 5f47f512e988fde243f52a78b2eb9656350a2f93 ??

This is the old block: (Just the Knightly section missing in refactored code)
Code: [Select]
if(  umgebung_t::networkmode  ) {
// did we receive a new command?
unsigned long ms = dr_time();
network_command_t *nwc = network_check_activity( this, next_step_time>ms ? min( next_step_time-ms, 5) : 0 );
if(  nwc==NULL  &&  !network_check_server_connection()  ) {
dbg->warning("karte_t::interactive", "lost connection to server");
network_disconnect();
}

// Knightly : send changed limits to server where necessary
if(path_explorer_t::are_local_limits_changed()  ) {
path_explorer_t::limit_set_t local_limits = path_explorer_t::get_local_limits();
network_send_server( new nwc_routesearch_t(sync_steps, map_counter, local_limits, false) );
path_explorer_t::reset_local_limits_state();
dbg->warning("karte_t::interactive", "nwc_routesearch_t object created and sent to server: sync_step=%u map_counter=%u limits=(%u, %u, %u, %llu, %u)",
sync_steps, map_counter, local_limits.rebuild_connexions, local_limits.filter_eligible, local_limits.fill_matrix, local_limits.explore_paths, local_limits.reroute_goods);
}

// process all the received commands at once
while (nwc) {
I don't know if there's any more Extended specific code missed in these merges, I was only looking at the path explorer limits...

Offline jamespetts

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 20274
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Performance analysis
« Reply #23 on: September 05, 2020, 11:24:14 PM »
Splendid, thank you for that: I have now re-incorporated the missing section.

Offline TurfIt

  • Dev Team, Coder/patcher
  • Devotee
  • *
  • Posts: 1395
Re: Performance analysis
« Reply #24 on: September 05, 2020, 11:32:48 PM »

Some performance analysis from the server log you posted in the desync thread:
A new_month() results in the server falling 11 seconds behind where it should be.
Each sync_step() it can catch up ~26 ms. But each step() sets it back a further 360ms.
Resulting in a net 153ms catchup per second realtime. Which I calculate as the game running at ~85% server CPU load average.

For clients, you have to subtract the display rendering time from that ~26ms above. As I mentioned before, frame render times are easily 10-40ms range. Hence client better be running faster CPUs than the server, in a small window zoomed in.

Also, with that length step() time, clients would see a 1/3 second pause followed by 5-6 frames as fast as their computer can render, then back to normal 15 fps. Repeat every second.
Finally, this also represents simloops < 1, where you really want 2-5, but there's no CPU room available for that.


Offline freddyhayward

  • Devotee
  • *
  • Posts: 437
  • Languages: EN
Re: Performance analysis
« Reply #25 on: September 05, 2020, 11:48:22 PM »
Clearly GIT still eludes me.... That commit number looks like the commit into standard, not the merge into Extended despite that's what I was looking at.
Maybe: 5f47f512e988fde243f52a78b2eb9656350a2f93 ??

This is the old block: (Just the Knightly section missing in refactored code)
Code: [Select]
if(  umgebung_t::networkmode  ) {
// did we receive a new command?
unsigned long ms = dr_time();
network_command_t *nwc = network_check_activity( this, next_step_time>ms ? min( next_step_time-ms, 5) : 0 );
if(  nwc==NULL  &&  !network_check_server_connection()  ) {
dbg->warning("karte_t::interactive", "lost connection to server");
network_disconnect();
}

// Knightly : send changed limits to server where necessary
if(path_explorer_t::are_local_limits_changed()  ) {
path_explorer_t::limit_set_t local_limits = path_explorer_t::get_local_limits();
network_send_server( new nwc_routesearch_t(sync_steps, map_counter, local_limits, false) );
path_explorer_t::reset_local_limits_state();
dbg->warning("karte_t::interactive", "nwc_routesearch_t object created and sent to server: sync_step=%u map_counter=%u limits=(%u, %u, %u, %llu, %u)",
sync_steps, map_counter, local_limits.rebuild_connexions, local_limits.filter_eligible, local_limits.fill_matrix, local_limits.explore_paths, local_limits.reroute_goods);
}

// process all the received commands at once
while (nwc) {
I don't know if there's any more Extended specific code missed in these merges, I was only looking at the path explorer limits...
Just out of curiosity, how significant is this commit to addressing desync issues?

Offline TurfIt

  • Dev Team, Coder/patcher
  • Devotee
  • *
  • Posts: 1395
Re: Performance analysis
« Reply #26 on: September 06, 2020, 12:03:48 AM »
Shouldn't affect desync at all. Without, everybody would be running at the server's calculated limits. With, limits are adjusted to the slowest client, so without, they would be having a bad time.

Offline jamespetts

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 20274
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Performance analysis
« Reply #27 on: September 06, 2020, 12:08:10 AM »
Interesting. Testing locally, with a 4k monitor and Simutrans running the Bridgewater-Brunel saved game from 1863 maximised, I get 4.9 simloops with the path explorer on standby at default zoom.

Offline TurfIt

  • Dev Team, Coder/patcher
  • Devotee
  • *
  • Posts: 1395
Re: Performance analysis
« Reply #28 on: September 06, 2020, 12:30:00 AM »
Run locally with your simuconf.tab from the server, and run locally as a server "sim -server"  (of course remove the server announce sections from simuconf...)
Timing is done completely different in "normal" step mode vs "fixed_step" mode, aka network mode.

EDIT: normal mode with this savegame is still jerky. Likely still the fault of step(), if you have insufficient int_checks() spread throughout...

I also note the sever is sending:
Code: [Select]
Warning: nwc_routesearch_t::transmit_active_limit_set: transmit succeeded sync_step=2 map_counter=1683307 limits=(1024, 98304, 65536, 16777216, 256)
Those are the default limits, which I believe would be due to the server never having completed a path explorer cycle. (That would be rather bad for players with routes never being found...) [EDIT: nope, due to the code that was missing. Should work once the server updates to the latest version.]
The defaults seem a little off to me, perhaps:
Code: [Select]
// default iteration limits
static const uint32 default_rebuild_connexions  = 0x0400;
static const uint32 default_filter_eligible = 0x0000AAAA;
static const uint32 default_fill_matrix     = 0x00009000;
static const uint64 default_explore_paths   = 0x02000000;
static const uint32 default_reroute_goods   = 0x0002AAAA;
« Last Edit: September 06, 2020, 04:48:35 AM by TurfIt »

Offline Matthew

  • *
  • Posts: 420
    • Japan Railway Journal
  • Languages: EN, some ZH, DE & SQ
Re: Performance analysis
« Reply #29 on: September 07, 2020, 01:11:54 PM »
Thank you to James and Turfit for your kind efforts to understand and improve the game's performance on large maps.

My initial impression is that the changes over the weekend have greatly improved the game's performance for me. To try to objectively measure that, I have collected some data from Bridgewater-Brunel play sessions of similar length (~1 real life hour) on 2 September and today, using the "time difference from server" values in the log files. Those are reported in exact multiples of 66 milliseconds, which according to comments above is the frame setting on the server. So a value of "66 milliseconds" is unlikely to mean that the client is exactly 66 milliseconds behind the server. I suspect that it means the client is one tick behind the server, but I am not certain, since a comment in simunits.h suggests that Simutrans' 'milliseconds' are not always the same as real-life milliseconds. I have restated the values in 'frames' where 1 frame = 66 milliseconds, for ease of reading. I have called it a 'frame' but I have no idea whether that is actually what 66 ms means.

Here are the lag statistics from 2 September:



We can see that the vast majority of the time (66%!  ;D ) my client lags one 'frame' behind the server. But the mean is much higher; sometimes it lags much more. This graph (x-axis = log entries, y-axis = 'frames') confirms the pattern that I observed subjectively:



Note that the y-axis is logarithmic. When I played alone, I was able to keep up. But when other players joined the game, the lag rose rapidly to a much, much higher level and stayed there until either I desynched (the server kicked the client) or manually resynched (I closed the client and joined the game using a new instance of the client).

Here are the lag statistics from today:



It's still the case that the client lags behind the server for 1 'frame' most (70%) of the time. But the maximum lag has fallen from 183 'frames' to 55 'frames'. And today I was able to continue playing after not one, but two players joined the game. The graph helps to understand what is happening:



Note that the y-axis is now not logarithmic.

The number of 'frames' spiked twice. I do not know what prompted these, but the logs show that they are not exactly the same as points where other players joined. So not only has the lag caused by other players joining possibly disappeared, but it now seems that the client was somehow able to recover and make up the lag after those spikes.

These observations may just be coincidence; I have only played for a couple of hours since the code was changed. But today's pattern is very different from my previous experience over the last few weeks, so I am hopeful that your efforts last weekend have greatly improved the game's playability for those of us on mid-range PCs.

Offline TurfIt

  • Dev Team, Coder/patcher
  • Devotee
  • *
  • Posts: 1395
Re: Performance analysis
« Reply #30 on: September 07, 2020, 06:57:39 PM »
66 milliseconds is the frame interval for 15 fps, which is what the server is attempting to run. The "time difference" values are calculated in # of frames, so will be multiples of 66 when displayed.

The "difference to server" is not absolute, but relative. i.e. The server sends a timing packet, there's a variable transmission time to when the client receives, so the client is actually a little behind, but we don't care. Actually parameters are set for clients to purposely run a few frames behind. So a 0 "difference to server" means 0 "difference from where we're supposed to be". If you have a "bad" connection to the server, jitter, variable pings, etc.., the timing gets quite messed up. Only fix is a better ISP.

As to purposely running behind, I don't know what the server has set for server_frames_ahead. Default is 4.
Combined with additional_client_frames_behind, which also default 4 (But the simuconf.tab distributed with Sim-Ex sets to 0!!!), gives 8 frame lag, 1/2 second at 15 fps. This was important to prevent "execute in past" desyncs previously (fixed by Dr.SuperGood - but not fully fixed in Extended - see desync thread). Now, it's important for smoothness if you have a faster computer than the server. With 4/0, I get horrendous stuttery movement connecting to the server. Setting 4/8, and movement is much improved, closer to offline behaviour (which is still bad, but...)

I lurked on the server a bit yesterday, see the path explorer limits are now being sent to/from the server correctly. And see some people are connecting with real potatoes! Result is limits dropping to 1/10 what even the slowish server can calculate. (I wonder if some limit to the limits should be implemented. Someone causing a 90% slowdown is tantamount to a DOS...)


Offline jamespetts

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 20274
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Performance analysis
« Reply #31 on: September 08, 2020, 04:51:30 PM »
Thank you both very much for your analysis on this. The question of a floor for server performance is worth considering given the possibility of slow clients to interfere with other players' experiences; this will need to be considered carefully in light of what actually transpires on the server.

I have now merged the missing code identified by TurfIt; thank you for that.

For reference, we currently have the following values set in simuconf.tab on the Bridgewater-Brunel server:

server_frames_behind = 4
additional_client_frames_behind = 4
server_frames_per_step = 16
server_frames_between_checks = 1

TurfIt - do you recommend modifying any of those settings to test whether they improve performance/responsiveness? I wonder whether the additional_client_frames_behind setting is necessary given that we now have the timing synchronisation code?

Offline TurfIt

  • Dev Team, Coder/patcher
  • Devotee
  • *
  • Posts: 1395
Re: Performance analysis
« Reply #32 on: September 08, 2020, 05:03:24 PM »
additional_client_frames_behind is a client side setting. http://bridgewater-brunel.me.uk/downloads/nightly/packages/Simutrans-Extended-Complete.zip has it set to 0. I would suggest 4 is a good default, but it's really up to each client to set for their circumstances. Getting r8348 applied should help too.

Ideally server_frames_per_step would be no more than 8 with a 15 fps setting, but need more cpu power for that.
And frames_between_checks =1 is only for desync tracking. Once that's settled, = 150 would be fine. i.e. a check every 10 secs. That also allows clients to more smoothly adjust their timing instead of being 'jerked' rudely about by the servers timing every frame.
( and remove / temp disable the rands, debug_sums in the checklists)

Offline jamespetts

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 20274
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Performance analysis
« Reply #33 on: September 08, 2020, 05:07:07 PM »
Interesting, thank you for that. I will monitor the loss of synchronisation situation over the coming few days to see whether any that remain are so rare as not to need any action: one with a different apparent cause has been reported recently.

The idea of setting server_frames_per_step to 16 was to allow for a higher frame-rate for the client without making the game undertake more hard computations within any given time than necessary, giving more apparent smoothness by allowing an increase in the framerate. I am not sure how well that this has worked, however, as I have had to reduce the framerate to 15 in any event. Would you recommend changing server_frames_between_steps to 8?

As for the server_frames_behind, is there a need for this now that we have the time synchronisation code?

Offline TurfIt

  • Dev Team, Coder/patcher
  • Devotee
  • *
  • Posts: 1395
Re: Performance analysis
« Reply #34 on: September 08, 2020, 05:26:47 PM »
Unless you can overclock your server by ~20% more, there's no room to set frames_between lower. You could probably go to 15 to match the 15 fps and result in simloops==1, but the server is maxed.

The client frame timing code needs some breathing room to work, hence running a total 8 behind is good (4 from server setting, 4 from client). If the client runs ahead faster than the server, and hits the barrier, the client player will see a quite noticeable pause/hiccup happening on a regular basis. I find that nauseating.

Also note my suggested revision to the default iteration limits in a post above...