The International Simutrans Forum

 

Author Topic: Performance analysis  (Read 2201 times)

0 Members and 1 Guest are viewing this topic.

Offline jamespetts

  • Simutrans-Extended project coordinator
  • Administrator
  • *
  • Posts: 20342
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Performance analysis
« on: July 22, 2020, 12:34:08 PM »
I notice that some people on the server were discussing performance issues. A brief profiling run gives a breakdown of where the most computational effort is being spent (bear in mind that multi-threading in some but not all parts of the code may distort how this affects real world performance).
Some highlights from the findings (using the current Bridgewater-Brunel saved game in single player mode; performance may be subtly different in multi-player mode) are:
  • graphics (main_view_t::display_region), which are multi-threaded, take a total of 31.79% of CPU time;
  • passenger and mail generation (step_passengers_and_mail_threaded_ takes a total of 16.82% of CPU time, of which 10.01% (overall CPU time, not 10.01% of 16.82%) consists of finding a path between halts by querying the hashtable (haltestelle_t::find_route);
  • vehicle physics calculations (convoi_t::calc_acceleration) takes 10.89% of CPU time;
  • the main path explorer method, which runs in its own thread, (path_explorer_t::compartment_t::get_path_between) takes 5.04% of CPU time;
  • pedestrians (pedestrian_t::sync_step) take 1.70% of CPU time;
  • convoy routing (including ships), which is multi-threaded, (convoi_t::drive_to) takes 1.67% of CPU time;
  • convoy loading at stops (haltestelle_t::request_loading) takes 1.14% of CPU time
These results are with my computer on this particular saved game; different results with different saved games are likely.

Offline Freahk

  • Devotee
  • *
  • Posts: 1355
  • Languages: DE, EN
Re: Performance analysis
« Reply #1 on: July 22, 2020, 12:52:39 PM »
Good to know, but I am still quite sure CPU load by itself is not the main issue here.
Most of the time, none of my cores are even close to 50% computational load.
I do rather expect the the bottleneck to be located elsewhere. E.g. memory bandwith, pipeline stall or control hazzards, though i did not find out how to profile this properly yet.

Offline kierongreen

  • Dev Team, Coder/patcher
  • Devotee
  • *
  • Posts: 2346
Re: Performance analysis
« Reply #2 on: July 22, 2020, 12:55:07 PM »
Good to know, but I am still quite sure CPU load by itself is not the main issue here.
Most of the time, none of my cores are even close to 50% computational load.
I do rather expect the the bottleneck to be located elsewhere. E.g. memory bandwith, pipeline stall or control hazzards, though i did not find out how to profile this properly yet.
Memory bandwidth issues (including issues relating to processor cache size) can be identified if performance increases from more threads tails off. It's very dependent on the system being used though.

Offline jamespetts

  • Simutrans-Extended project coordinator
  • Administrator
  • *
  • Posts: 20342
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Performance analysis
« Reply #3 on: July 22, 2020, 12:56:00 PM »
Good to know, but I am still quite sure CPU load by itself is not the main issue here.
Most of the time, none of my cores are even close to 50% computational load.
I do rather expect the the bottleneck to be located elsewhere. E.g. memory bandwith, pipeline stall or control hazzards, though i did not find out how to profile this properly yet.

I suspect that memory bandwidth is a major issue, especially for graphics and the routing algorithms for passengers/mail/goods.

As to profiling, are you using Visual Studio? If so, it is quite straightforward: use the optimised debug build and go to Debug > Performance Profiler. In GCC, it is also fairly straightforward to profile, although I cannot remember the exact steps.

Edit: Disabling water animation can significantly reduce graphics load on the system.

Offline Freahk

  • Devotee
  • *
  • Posts: 1355
  • Languages: DE, EN
Re: Performance analysis
« Reply #4 on: July 22, 2020, 03:12:10 PM »
Quote
Edit: Disabling water animation can significantly reduce graphics load on the system.
I'll try this offline, if so it would be very kind of you to disable this on bridgewater.

I'm using clion. Visual studio under Linux is a mess, just like any microsoft product. I could install windows, but I don't agree with that company in very many points, so I prefer not using their products whenever possible.

Anyways, clion has a lot of debugging and profiling features either, I'll find it out...

Offline jamespetts

  • Simutrans-Extended project coordinator
  • Administrator
  • *
  • Posts: 20342
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Performance analysis
« Reply #5 on: July 22, 2020, 05:16:43 PM »
For Linux, I recommend GCC for profiling; it does have some fairly standard ways of doing this. I am not familiar with Clion - is that a compiler or an IDE?

So far as water animation is concerned, you can disable this locally for online games without disabling it on the server: simply set water_animation_ms=0 in the base simuconf.tab. This is one of the settings that is not synchronised between client and server.

Offline Freahk

  • Devotee
  • *
  • Posts: 1355
  • Languages: DE, EN
Re: Performance analysis
« Reply #6 on: July 22, 2020, 07:51:54 PM »
Clion is an IDE using cmake, gcc (or clang if you choose to), gdb and all these standard gnu tools under the hood.

Offline jamespetts

  • Simutrans-Extended project coordinator
  • Administrator
  • *
  • Posts: 20342
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Performance analysis
« Reply #7 on: July 22, 2020, 08:29:29 PM »
Clion is an IDE using cmake, gcc (or clang if you choose to), gdb and all these standard gnu tools under the hood.


Splendid - you should be able to use the GDB profiler with that, in that case.

Offline Matthew

  • *
  • Posts: 439
    • Japan Railway Journal
  • Languages: EN, some ZH, DE & SQ
Re: Performance analysis
« Reply #8 on: August 08, 2020, 04:36:22 PM »
From the markers,  etc. thread:

Quote from: jamespetts
Thank you for your thoughts on this. We should really move discussion of the performance to the thread specific to performance; would you mind posting your results in that thread so that I can gauge better whether to reduce the framerate and also get feedback on this from other players?

I have not sure exactly what data you want here, so here is a first attempt.

I have continued trying to play around with settings to understand the UI lag. When playing Bridgewater-Brunel off-line in 1845, the in-game Display GUI now indicates a Frame Time of ~30-40ms and 10-18 fps normally. This goes down to ~60ms and 8-10fps when the Path Explorer is processing passenger classes 3 and 4 (the goods classes are no trouble). The Idle value is always 0ms (except when it jumps to a very high figure that I guess might be due to a an integer wraparound), but I don't know how significant that is in the days of multicore CPUs.

The fact that the performance is worst when the Path Explorer runs certain passenger classes makes me wonder whether it might be worth reducing path_explorer_time_midpoint in preference to hurting other players' graphical performance. I tried setting it to 48 and loaded the B-B savefile offline: Frame Time did not seem to be affected, but FPS went up to about 22 normally; during heavy passenger classes the values were ~42ms and 15-20fps; I also got sporadic idle time.

As a side note, that fact seems to support Freahk's idea that the issue is memory bandwidth, since there seems to be plenty of spare capacity in the CPUs and RAM. I accept that the Path Explorer is inevitably going to be computationally demanding in a game played on a large map and that nobody can alter the Path Explorer in the near future; just noting the point for future reference.


Offline jamespetts

  • Simutrans-Extended project coordinator
  • Administrator
  • *
  • Posts: 20342
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Performance analysis
« Reply #9 on: August 08, 2020, 06:21:28 PM »
Thank you for the feedback - that is helpful. It is better to change one parameter at a time for testing. I have modified the framerate to reduce to 15fps on the server, and I should be grateful for feedback (from to-morrow when this change will take effect) as to how this affects lag.

Online prissi

  • Developer
  • Administrator
  • *
  • Posts: 10303
  • Languages: De,EN,JP
Re: Performance analysis
« Reply #10 on: August 10, 2020, 01:07:48 AM »
The CPU load is not really a good indications, since Simutrans have many places, which must be processed in certain order to prevent desync. Thus involves neccessarily waiting, even with multithreading and thus the CPU load will be usually even leass than 50%, depending of the number of thread available.

Unless the Microsoft profiler did made tremendous progress, in my experience the numbers are rather off, since it count any library or OS call as as part of the instrumented code. One of the biggest number for a threaded server should be the percentage of waiting for barriers. It is not time spent in the code, but if most real time is wasted at barriers, then total time increases with multithreading despite profiling tells you otherwise.

In this regard, best results on total time are probably from profiling single and multithreaded and compare.

Offline Matthew

  • *
  • Posts: 439
    • Japan Railway Journal
  • Languages: EN, some ZH, DE & SQ
Re: Performance analysis
« Reply #11 on: August 27, 2020, 12:48:04 AM »
Thank you for the feedback - that is helpful. It is better to change one parameter at a time for testing. I have modified the framerate to reduce to 15fps on the server, and I should be grateful for feedback (from to-morrow when this change will take effect) as to how this affects lag.

James, here is some feedback, which is a mixture of good and bad news.

The good news is that something dramatically decreased UI lag for a while. From around 8th August it became trivial, <10s, often much less, and I was able to play the game full-screen (at 1080P) and to use the map as well.

The first piece of bad news though is I can't be certain that this was entirely due to the framerate change because that same weekend my VM provider very slightly upgraded my CPU. It was a miniscule upgrade (just one rung on the Tom's Hardware ladder), but the new CPU (Ryzen 5 1400) has a 50% larger L3 cache, so that might be significant if memory bandwidth is the problem.

The second piece of bad news is that UI lag returned with a vengeance yesterday (before the nightly update of 26th August GMT). In the GMT afternoon of 25th August, I had no or trivial lag, but that evening it was terrible again, at least a minute. Tonight, even immediately after a resynch, tree planting/deletion took 38 seconds. Just processing password entry incurs a noticeable lag. After 1.5 in-game hours this increases to 2m30s to 3m or so.

Yet just now, at 0000 GMT, I was back to no UI lag again even with full-screen 1080P. I noticed that there didn't seem to be anyone else connected, so I closed the client, opened a fresh instance of the client and again had trivial lag (even though the Path Explorer was processing passenger class 3). Only you (James) were connected at the time.

Why has UI lag suddenly accelerated? Some possibilities:
  • The game has gradually been increasing in complexity within my VM's performance envelope and yesterday it exceeded one of those limits. If so, my highly speculative guess is that the Path Explorer has again exceeded some threshold, causing cache thrashing. But this does not fit with the fact that lag seems to vary depending on who else is playing.
  • It's caused by the actions of other (non-Simutrans) users of the same cloud service. I guess this is possible, but at no point am I short of Internet bandwidth (savegames load at 100Mbps) and the provider's whole USP is that we rent dedicated systems, not shared servers. So I should not be sharing CPU time or RAM with anyone else. The SSD/HDD storage must be shared (presumably on some RAID-like system), but that seems irrelevant.
  • It's perhaps possible that the framerate or some other setting was accidentally changed during the changes you and ceeac have been making to server settings. But then why is everything fine right now?
  • I have been playing at the same time as other players yesterday and today, whereas last week I was often playing alone. But tonight I had no lag when you (James) were online. So it's something to do with the interaction of multiple players on the server. But it's also possible that this is somehow bound up with the Path Explorer, since I guess that active players are more likely to trigger changes to the routing table.
  • Maybe your changes to industry code have something to do with it, but I doubt it.

I know that yesterday the terrible lag only began after freddyhayward connected. (Sorry Freddy! This is obviously not something you intended or can control and I know that your struggles with slow Anternet are at least as frustrating as my struggles with UI lag!)

Possible relevant factors:
  • Freddy uses the Linux client and I use the Windows client. Supporting this is the fact that I have no lag when playing alongside James, who also uses the Windows client. But I cannot explain why this would be. I would like to test with combinations of players+clients to see if that supports/refutes this theory. It may also be relevant that I had little lag last week when Freahk, who also uses the Linux client, was away.
  • Freddy's slow Internet means that my client is paused while his finishes downloading. If anything, that ought to give me a head start with processing though.
  • Freddy's poor Internet means that he often has to resynch. Possibly the cumulative resynchs are causing memory corruption or something else that slows my client's performance.
  • James connecting somehow slows down the server so that I can keep up. James, were you running rebuilds on the server whilst playing last night, or something like that? But that would not explain the low lag I had last week when playing alone.

Sorry, that's a lot of hypotheses but I have had plenty of typing time while timing tree deletions tonight.  ::'(

P.S. BTW the lag persists even when I disconnect and continue running the network game offline. Only closing the client and restarting a fresh instance stops it. But older forum posts suggest that this effect is already well-known to you.
« Last Edit: August 27, 2020, 01:06:35 AM by Matthew »

Offline Ranran

  • Devotee
  • *
  • Posts: 1283
  • Languages: ja
Re: Performance analysis
« Reply #12 on: August 27, 2020, 01:21:43 AM »
I think at least the time it takes to connect to Bridgewater server (download data) is not due to internet speed.

Online freddyhayward

  • Devotee
  • *
  • Posts: 456
  • Languages: EN
Re: Performance analysis
« Reply #13 on: August 27, 2020, 01:23:00 AM »
In general, my anecdotal experience since the latest round of builds is that there are more desyncs when linux and windows users are on at the same time, but very few when only linux users are online. Hopefully ceeac's fixes were relevant to this.

Online freddyhayward

  • Devotee
  • *
  • Posts: 456
  • Languages: EN
Re: Performance analysis
« Reply #14 on: August 27, 2020, 04:42:39 AM »
Freddy's poor Internet means that he often has to resynch. Possibly the cumulative resynchs are causing memory corruption or something else that slows my client's performance.
Actually, my internet speed shouldn't have any impact on this at all, since the amount of data transferred after downloading the save is trivial.

Offline jamespetts

  • Simutrans-Extended project coordinator
  • Administrator
  • *
  • Posts: 20342
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Performance analysis
« Reply #15 on: August 27, 2020, 10:17:26 AM »
Matthew, thank you for your feedback. First of all I should note that my attempt to reduce the framerate did not work because this did not end up overriding the saved game settings. This has worked in the past: I suspect that the simuconf.tab override code has changed since then. This would mean that I would need to have the simuconf.tab settings overriding all the saved game settings in order to change the framerate, which would be undesirable, as there are many settings in the base simuconf.tab that should not override pakset settings. I had the same problem with the industry settings originally, but implemented a specific always on override for this, so only in the last day or two have we had the industry density proportion override as intended.

The consequence of all this is that we have been running at 30fps throughout. When I connect to the server with my somewhat fast computer, the behaviour that I get is that it is not very smooth: every second or so it will run quickly for a fraction of a second then pause for a fraction of a second. This suggests that the server is slower than the my client, which is constantly having to slow down to allow the server to catch up. I am not getting any noticeable input lag.

It would be interesting to know what other people's experiences of input lag are. The wild fluctuations reported by Matthew do not seem consistent with anything happening in the game; unless anyone else reports similar fluctuations, I suspect that these will be related to Matthew's VM, but in what way I cannot tell. One thing that Matthew could do is look at the statistics in the "display" dialogue and check frame time when things are running well as well as when things are running poorly and report these. If others could also report their frame times at the same time, this would be helpful. We could then see whether there is a similar relative fluctuation in frame times (even if the absolute values on others' frame times mean that they do not get lag even at the worst values), or whether the relative fluctuations are only on Matthew's VM, which would mean that the problem is likely to be with the VM.

As to the losses of synchronisation, I should be grateful if people could discuss that on the dedicated thread, since I will lose track of any discussion on that issue that takes place here.

I should note that it is inherently very unlikely that a different platform client being connected to the server at the same time could affect the performance of another client. The data transmitted between client and server while running is minimal (limited to the broadcast of actions by other players and any commands from the server to the client to pause while the server catches up), and the performance will depend on how the code is executed locally on the server. Matthew might want to check the framerate of the game in single player mode on his VM to see whether the framerate (which varies in single player mode) varies with time in the same way that the lag varies. If the single player framerate drops <30, then this indicates a condition in which lag is likely to occur if connected to the server.

In the meantime, I will look at implementing universal server override for framerates, as it does not make any sense for the saved game value to override the simuconf.tab value for framerates on a server.

Offline Freahk

  • Devotee
  • *
  • Posts: 1355
  • Languages: DE, EN
Re: Performance analysis
« Reply #16 on: August 27, 2020, 10:32:44 AM »
This would mean that I would need to have the simuconf.tab settings overriding all the saved game settings in order to change the framerate, which would be undesirable
I do usually use the simuconf in users directory for that purpose.
The alternative of downloading the map, using the edit tool, aving it and uploading it again will also do the job, but somehow passwords will be lost in that process.

I do not have any performance troubles of that kind at the moment, apart from end-of-month lags, which is pretty usual.

I just took some time to make a few observations:
Sometimes, after a desync the game is extremely laggy. It's reporting something around 5 fps.
Pausing and unpausing the game will make it run much faster again.

Loading and saving the game seems to reduce fps in some cases.
After the pause/unpause thing, I ended up at something around 22 fps in singeplayer.
Saving and loading the game ended up in 17 fps. Pause/unpause did not help here, apart from a rather short 30 fps spike.
The CPU load in these cases was roughly the same. Nothing special could be observed in htop ("task manager")
Notice that the path explorer was running in those cases.
On the server, I get a completely different fps behavior. It is extremely jumpy. Everything between 10 fps and 300 fps was observed.
The UI is a little umpy as well, but I did not fall behind. That means the dragging preview of ways does soethimes not work at all, but when I release the mouse button, the way is built immediately.
« Last Edit: August 27, 2020, 11:01:25 AM by Freahk »

Offline jamespetts

  • Simutrans-Extended project coordinator
  • Administrator
  • *
  • Posts: 20342
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Performance analysis
« Reply #17 on: August 27, 2020, 11:23:54 AM »
Thank you - I have been testing the FPS issue. It seems that I was mistaken in my initial post and that the existing code already provides for FPS overriding for servers, and the framerate of 15 has been correctly set on the server; however, because the framerate is inconsistent for some reason, it is not observed as being 15 in the display dialogue. I am not sure why the framerate is inconsistent, as I do not have knowledge of the code controlling this.

Offline Matthew

  • *
  • Posts: 439
    • Japan Railway Journal
  • Languages: EN, some ZH, DE & SQ
Re: Performance analysis
« Reply #18 on: August 27, 2020, 04:48:45 PM »
Sorry, I wrote "poor Internet" and "slow Internet" in the wrong lines, which was confusing.

I think at least the time it takes to connect to Bridgewater server (download data) is not due to internet speed.

My impression is that when Freddy and Huitsi connect, the "pause" period lasts longer than when other players connect. I will try to measure the time to confirm this, but it's a little difficult because I don't know when other people will connect.

I interpreted the longer pause period as Freddy and Huitsi needing a longer period than others to download the server save game because they have less Internet bandwidth.

Actually, my internet speed shouldn't have any impact on this at all, since the amount of data transferred after downloading the save is trivial.

You are quite right, of course.

It would be interesting to know what other people's experiences of input lag are. The wild fluctuations reported by Matthew do not seem consistent with anything happening in the game; unless anyone else reports similar fluctuations, I suspect that these will be related to Matthew's VM, but in what way I cannot tell. One thing that Matthew could do is look at the statistics in the "display" dialogue and check frame time when things are running well as well as when things are running poorly and report these. If others could also report their frame times at the same time, this would be helpful. We could then see whether there is a similar relative fluctuation in frame times (even if the absolute values on others' frame times mean that they do not get lag even at the worst values), or whether the relative fluctuations are only on Matthew's VM, which would mean that the problem is likely to be with the VM.

I only see a fixed frame time of 66 ms, so I am guessing that by "frame time" you mean "fps", right? Just now, with 3-5 other players connected and active, I get 14-16fps, 1.5-1.6 Simloops, 100 idle time. But it periodically spikes to 300-400fps, which I think correlates with the Path Explorer starting a new class. During the "pause" period when people join, I get 9 or 10 fps.

I will report further when I'm able to connect alone.

Quote
Matthew might want to check the framerate of the game in single player mode on his VM to see whether the framerate (which varies in single player mode) varies with time in the same way that the lag varies. If the single player framerate drops <30, then this indicates a condition in which lag is likely to occur if connected to the server.

Thank you, I will test this soon.

Offline Matthew

  • *
  • Posts: 439
    • Japan Railway Journal
  • Languages: EN, some ZH, DE & SQ
Re: Performance analysis
« Reply #19 on: August 27, 2020, 11:09:17 PM »
Matthew might want to check the framerate of the game in single player mode on his VM to see whether the framerate (which varies in single player mode) varies with time in the same way that the lag varies. If the single player framerate drops <30, then this indicates a condition in which lag is likely to occur if connected to the server.

I have now collected this data:



If that is too small to read, then the spreadsheet is here.

The "range" figures are minima and maxima; the "typical" figures are my subjective impression of the mode. But my eyes could not be everywhere at once, so they are inevitably inaccurate.

"Accelerating" is used when the trajectory of the frame time is used when the figures seemed to increase exponentially until maxing out at 246 (when the Path Explorer finished) or 250 (where the frame time permanently stuck).

The thread %CPU figures are from Process Hacker. This program's outputs are not well documented, but it appears to be the percentage of CPU time used by each of the game's threads. I always noted the figures for the main thread; the "highest thread" rows are used only when another thread was using more CPU than the main thread. When this happened, there were often four such threads at very similar levels.

The "fill path matrix" and "explore paths" rows are probably inaccurate. I only noticed near the end of these observations that the figures seemed to carry over when I loaded a new savegame, which makes me wonder whether they are only updated each time the Path Explorer finishes a goods type.

I am still running the May 1863 game offline. After half a month it has entered its second PE cycle. I will update if it the PE ever returns to standby and I get appreciably different results.

BTW other players reported on Discord that the current network maxes out their CPU. My CPU was certainly very busy, but not maxed out. That would seem to fit Freahk's theory that memory bandwidth is the obstacle.

In conclusion, it appears that this VM has rarely met the standard James has set. So it seems that I may soon be departing from Bridgewater-Brunel. But it is puzzling that I had long periods last week without desynchs or more than trivial UI lag. It might also be significant that none of the Path Explorer on Standby values seem to change in a simple linear way, as I had expected. But even if the problem is narrowed down to the Path Explorer, I realize that a large game is inevitably going to have a heavy load of route-finding computations and that changing the Path Explorer would be a very major project that is not currently on the development roadmap.

Offline TurfIt

  • Dev Team, Coder/patcher
  • Devotee
  • *
  • Posts: 1395
Re: Performance analysis
« Reply #20 on: September 03, 2020, 05:56:41 PM »
Sorry, I wrote "poor Internet" and "slow Internet" in the wrong lines, which was confusing.
My impression is that when Freddy and Huitsi connect, the "pause" period lasts longer than when other players connect. I will try to measure the time to confirm this, but it's a little difficult because I don't know when other people will connect.
I interpreted the longer pause period as Freddy and Huitsi needing a longer period than others to download the server save game because they have less Internet bandwidth.
The pause period is the longer of the time for the server save, file transfer, server reload, or for all existing clients to save and reload. If a player with a slow computer is connected, they could cause everybody else to wait around. (Should really put a limit on that...)
Note, the game does not wait for the joining player to load. If they have a slow computer (load the game slower than the server), they'll be left behind with input lag right off the start.


Matthew, thank you for your feedback. First of all I should note that my attempt to reduce the framerate did not work because this did not end up overriding the saved game settings.
framerate is not a savegame setting - env_t vs settings_t.  Just change in simuconf.tab and done.  Note however framerate, server_frames_ahead, server_frames_per_step, and server_frames_between_checks are all linked and need adjusting together. (and additional_client_frames_behind, but that's on each client to adjust [And not as critical since Dr.SuperGoods barrier was added to stop the run ahead desyncs].) Ideally these would be set in ms rather than frames, but...

"Standard" default network settings were 10fps, 4 frames_per step. That's a step every 400ms (or 2.5 simloops). Offline the target is 5 simloops (200ms). You don't want less than 2 simloops else responsiveness will suffer. And you don't want more than 5 either - needless CPU load. I hope you weren't running 30 fps and 4 frames_per_step - that's a step every 130ms, nobody would be able to keep up (nor the server...).


, the behaviour that I get is that it is not very smooth: every second or so it will run quickly for a fraction of a second then pause for a fraction of a second. This suggests that the server is slower than the my client, which is constantly having to slow down to allow the server to catch up. I am not getting any noticeable input lag.
Check the server logs for "server lagging by" warnings. If many, you need to tone down the settings/game, or get a faster server. Clients attempt to adjust their rate to the servers, but can only do so much. (And if server_frames_between_checks==1 for desync debugging, will be timing jerked all over the place. [should be set for a check every 10 secs or so in normal use])

Also, Extended is relatively heavy in processing within step(). Standard is much more balanced sync_step() vs step(). This means everytime a step a runs, the game falls behind it's frame target (a step can take multiple frametimes to complete if too high a fps/too slow a server), and sprints out back to back frames to catch up. The client sees a pause, followed by a high framerate, before normal frame pacing, repeat the cycle. Very nauseating.

For best experience, you need to ensure step() completes before it's time for the next sync_step(). I typically see sync_step()/display() taking 10-40ms (depends on screen resolution, zoom factor, cpu speed, etc.), so not much time for step() if trying to run 15fps (66ms time interval). Before I stopped contributing, I had investigated breaking step() into chunks so a bit could process every frame(). It's entirely doable, and quite simple really, but I never finished (and work too old to not need starting anew). Something to consider....


It would be interesting to know what other people's experiences of input lag are.
Input lag in the order of tens of seconds/minutes is entirely due to ones computer not keeping up. Run '-debug 3 -log' and check for "time difference to server" messages. If that number is continually increasing, you're not keeping up. (even better run with a .exe complied with the console enabled, and you can watch in realtime...)



I should note that it is inherently very unlikely that a different platform client being connected to the server at the same time could affect the performance of another client.
True for Standard, not Extended which has the Path Explorer (PE).  The PE adjusts the amount of work it does each step() to minimize the time to completion, but not unduly take too much time in each step(). These iteration limits are sent to the server, which then commands all clients to use the limits from the slowest computer. So if performance improves after a particular client connects, that client has the slowest computer (and likely is getting a rather poor experience.)  Or, something has gone wrong in the limit calculations/transmission.   I'm not seeing where nwc_routesearch_t::check_for_transmission() is being called in the current code....
IIRC the server logs should show the limits received from each client, and the set to minimum command going out. I'd suggest checking this limit function is still working.

EDIT:
Code: [Select]
// Knightly : check if changed limits, if any, have to be transmitted to all clients
if(umgebung_t::server)
{
nwc_routesearch_t::check_for_transmission( this );
}
has gone missing from the code.... was in karte_t::interactive() which was refactored. Likely a massive merging from Standard mistake.
I believe it belongs in karte_t::process_network_commands() now,  around simworld.cc::10716.

EDITEDIT:
The other piece of limit code also went missing - see 759ef8ccc39b3fe895e838301866e63f869eb44d.
« Last Edit: September 03, 2020, 10:28:01 PM by TurfIt »

Offline jamespetts

  • Simutrans-Extended project coordinator
  • Administrator
  • *
  • Posts: 20342
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Performance analysis
« Reply #21 on: September 05, 2020, 01:21:49 PM »
TurfIt - thank you for that. I have reinstated the missing code quoted above from Knightly. I am slightly less clear on what in the commit with hash 759ef8ccc39b3fe895e838301866e63f869eb44d is now missing. I see that the process network command section is still present, albeit updated. Can you elaborate on what in that commit needs to be re-added? Thank you again.

Offline TurfIt

  • Dev Team, Coder/patcher
  • Devotee
  • *
  • Posts: 1395
Re: Performance analysis
« Reply #22 on: September 05, 2020, 10:55:46 PM »
Clearly GIT still eludes me.... That commit number looks like the commit into standard, not the merge into Extended despite that's what I was looking at.
Maybe: 5f47f512e988fde243f52a78b2eb9656350a2f93 ??

This is the old block: (Just the Knightly section missing in refactored code)
Code: [Select]
if(  umgebung_t::networkmode  ) {
// did we receive a new command?
unsigned long ms = dr_time();
network_command_t *nwc = network_check_activity( this, next_step_time>ms ? min( next_step_time-ms, 5) : 0 );
if(  nwc==NULL  &&  !network_check_server_connection()  ) {
dbg->warning("karte_t::interactive", "lost connection to server");
network_disconnect();
}

// Knightly : send changed limits to server where necessary
if(path_explorer_t::are_local_limits_changed()  ) {
path_explorer_t::limit_set_t local_limits = path_explorer_t::get_local_limits();
network_send_server( new nwc_routesearch_t(sync_steps, map_counter, local_limits, false) );
path_explorer_t::reset_local_limits_state();
dbg->warning("karte_t::interactive", "nwc_routesearch_t object created and sent to server: sync_step=%u map_counter=%u limits=(%u, %u, %u, %llu, %u)",
sync_steps, map_counter, local_limits.rebuild_connexions, local_limits.filter_eligible, local_limits.fill_matrix, local_limits.explore_paths, local_limits.reroute_goods);
}

// process all the received commands at once
while (nwc) {
I don't know if there's any more Extended specific code missed in these merges, I was only looking at the path explorer limits...

Offline jamespetts

  • Simutrans-Extended project coordinator
  • Administrator
  • *
  • Posts: 20342
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Performance analysis
« Reply #23 on: September 05, 2020, 11:24:14 PM »
Splendid, thank you for that: I have now re-incorporated the missing section.

Offline TurfIt

  • Dev Team, Coder/patcher
  • Devotee
  • *
  • Posts: 1395
Re: Performance analysis
« Reply #24 on: September 05, 2020, 11:32:48 PM »

Some performance analysis from the server log you posted in the desync thread:
A new_month() results in the server falling 11 seconds behind where it should be.
Each sync_step() it can catch up ~26 ms. But each step() sets it back a further 360ms.
Resulting in a net 153ms catchup per second realtime. Which I calculate as the game running at ~85% server CPU load average.

For clients, you have to subtract the display rendering time from that ~26ms above. As I mentioned before, frame render times are easily 10-40ms range. Hence client better be running faster CPUs than the server, in a small window zoomed in.

Also, with that length step() time, clients would see a 1/3 second pause followed by 5-6 frames as fast as their computer can render, then back to normal 15 fps. Repeat every second.
Finally, this also represents simloops < 1, where you really want 2-5, but there's no CPU room available for that.


Online freddyhayward

  • Devotee
  • *
  • Posts: 456
  • Languages: EN
Re: Performance analysis
« Reply #25 on: September 05, 2020, 11:48:22 PM »
Clearly GIT still eludes me.... That commit number looks like the commit into standard, not the merge into Extended despite that's what I was looking at.
Maybe: 5f47f512e988fde243f52a78b2eb9656350a2f93 ??

This is the old block: (Just the Knightly section missing in refactored code)
Code: [Select]
if(  umgebung_t::networkmode  ) {
// did we receive a new command?
unsigned long ms = dr_time();
network_command_t *nwc = network_check_activity( this, next_step_time>ms ? min( next_step_time-ms, 5) : 0 );
if(  nwc==NULL  &&  !network_check_server_connection()  ) {
dbg->warning("karte_t::interactive", "lost connection to server");
network_disconnect();
}

// Knightly : send changed limits to server where necessary
if(path_explorer_t::are_local_limits_changed()  ) {
path_explorer_t::limit_set_t local_limits = path_explorer_t::get_local_limits();
network_send_server( new nwc_routesearch_t(sync_steps, map_counter, local_limits, false) );
path_explorer_t::reset_local_limits_state();
dbg->warning("karte_t::interactive", "nwc_routesearch_t object created and sent to server: sync_step=%u map_counter=%u limits=(%u, %u, %u, %llu, %u)",
sync_steps, map_counter, local_limits.rebuild_connexions, local_limits.filter_eligible, local_limits.fill_matrix, local_limits.explore_paths, local_limits.reroute_goods);
}

// process all the received commands at once
while (nwc) {
I don't know if there's any more Extended specific code missed in these merges, I was only looking at the path explorer limits...
Just out of curiosity, how significant is this commit to addressing desync issues?

Offline TurfIt

  • Dev Team, Coder/patcher
  • Devotee
  • *
  • Posts: 1395
Re: Performance analysis
« Reply #26 on: September 06, 2020, 12:03:48 AM »
Shouldn't affect desync at all. Without, everybody would be running at the server's calculated limits. With, limits are adjusted to the slowest client, so without, they would be having a bad time.

Offline jamespetts

  • Simutrans-Extended project coordinator
  • Administrator
  • *
  • Posts: 20342
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Performance analysis
« Reply #27 on: September 06, 2020, 12:08:10 AM »
Interesting. Testing locally, with a 4k monitor and Simutrans running the Bridgewater-Brunel saved game from 1863 maximised, I get 4.9 simloops with the path explorer on standby at default zoom.

Offline TurfIt

  • Dev Team, Coder/patcher
  • Devotee
  • *
  • Posts: 1395
Re: Performance analysis
« Reply #28 on: September 06, 2020, 12:30:00 AM »
Run locally with your simuconf.tab from the server, and run locally as a server "sim -server"  (of course remove the server announce sections from simuconf...)
Timing is done completely different in "normal" step mode vs "fixed_step" mode, aka network mode.

EDIT: normal mode with this savegame is still jerky. Likely still the fault of step(), if you have insufficient int_checks() spread throughout...

I also note the sever is sending:
Code: [Select]
Warning: nwc_routesearch_t::transmit_active_limit_set: transmit succeeded sync_step=2 map_counter=1683307 limits=(1024, 98304, 65536, 16777216, 256)
Those are the default limits, which I believe would be due to the server never having completed a path explorer cycle. (That would be rather bad for players with routes never being found...) [EDIT: nope, due to the code that was missing. Should work once the server updates to the latest version.]
The defaults seem a little off to me, perhaps:
Code: [Select]
// default iteration limits
static const uint32 default_rebuild_connexions  = 0x0400;
static const uint32 default_filter_eligible = 0x0000AAAA;
static const uint32 default_fill_matrix     = 0x00009000;
static const uint64 default_explore_paths   = 0x02000000;
static const uint32 default_reroute_goods   = 0x0002AAAA;
« Last Edit: September 06, 2020, 04:48:35 AM by TurfIt »

Offline Matthew

  • *
  • Posts: 439
    • Japan Railway Journal
  • Languages: EN, some ZH, DE & SQ
Re: Performance analysis
« Reply #29 on: September 07, 2020, 01:11:54 PM »
Thank you to James and Turfit for your kind efforts to understand and improve the game's performance on large maps.

My initial impression is that the changes over the weekend have greatly improved the game's performance for me. To try to objectively measure that, I have collected some data from Bridgewater-Brunel play sessions of similar length (~1 real life hour) on 2 September and today, using the "time difference from server" values in the log files. Those are reported in exact multiples of 66 milliseconds, which according to comments above is the frame setting on the server. So a value of "66 milliseconds" is unlikely to mean that the client is exactly 66 milliseconds behind the server. I suspect that it means the client is one tick behind the server, but I am not certain, since a comment in simunits.h suggests that Simutrans' 'milliseconds' are not always the same as real-life milliseconds. I have restated the values in 'frames' where 1 frame = 66 milliseconds, for ease of reading. I have called it a 'frame' but I have no idea whether that is actually what 66 ms means.

Here are the lag statistics from 2 September:



We can see that the vast majority of the time (66%!  ;D ) my client lags one 'frame' behind the server. But the mean is much higher; sometimes it lags much more. This graph (x-axis = log entries, y-axis = 'frames') confirms the pattern that I observed subjectively:



Note that the y-axis is logarithmic. When I played alone, I was able to keep up. But when other players joined the game, the lag rose rapidly to a much, much higher level and stayed there until either I desynched (the server kicked the client) or manually resynched (I closed the client and joined the game using a new instance of the client).

Here are the lag statistics from today:



It's still the case that the client lags behind the server for 1 'frame' most (70%) of the time. But the maximum lag has fallen from 183 'frames' to 55 'frames'. And today I was able to continue playing after not one, but two players joined the game. The graph helps to understand what is happening:



Note that the y-axis is now not logarithmic.

The number of 'frames' spiked twice. I do not know what prompted these, but the logs show that they are not exactly the same as points where other players joined. So not only has the lag caused by other players joining possibly disappeared, but it now seems that the client was somehow able to recover and make up the lag after those spikes.

These observations may just be coincidence; I have only played for a couple of hours since the code was changed. But today's pattern is very different from my previous experience over the last few weeks, so I am hopeful that your efforts last weekend have greatly improved the game's playability for those of us on mid-range PCs.

Offline TurfIt

  • Dev Team, Coder/patcher
  • Devotee
  • *
  • Posts: 1395
Re: Performance analysis
« Reply #30 on: September 07, 2020, 06:57:39 PM »
66 milliseconds is the frame interval for 15 fps, which is what the server is attempting to run. The "time difference" values are calculated in # of frames, so will be multiples of 66 when displayed.

The "difference to server" is not absolute, but relative. i.e. The server sends a timing packet, there's a variable transmission time to when the client receives, so the client is actually a little behind, but we don't care. Actually parameters are set for clients to purposely run a few frames behind. So a 0 "difference to server" means 0 "difference from where we're supposed to be". If you have a "bad" connection to the server, jitter, variable pings, etc.., the timing gets quite messed up. Only fix is a better ISP.

As to purposely running behind, I don't know what the server has set for server_frames_ahead. Default is 4.
Combined with additional_client_frames_behind, which also default 4 (But the simuconf.tab distributed with Sim-Ex sets to 0!!!), gives 8 frame lag, 1/2 second at 15 fps. This was important to prevent "execute in past" desyncs previously (fixed by Dr.SuperGood - but not fully fixed in Extended - see desync thread). Now, it's important for smoothness if you have a faster computer than the server. With 4/0, I get horrendous stuttery movement connecting to the server. Setting 4/8, and movement is much improved, closer to offline behaviour (which is still bad, but...)

I lurked on the server a bit yesterday, see the path explorer limits are now being sent to/from the server correctly. And see some people are connecting with real potatoes! Result is limits dropping to 1/10 what even the slowish server can calculate. (I wonder if some limit to the limits should be implemented. Someone causing a 90% slowdown is tantamount to a DOS...)


Offline jamespetts

  • Simutrans-Extended project coordinator
  • Administrator
  • *
  • Posts: 20342
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Performance analysis
« Reply #31 on: September 08, 2020, 04:51:30 PM »
Thank you both very much for your analysis on this. The question of a floor for server performance is worth considering given the possibility of slow clients to interfere with other players' experiences; this will need to be considered carefully in light of what actually transpires on the server.

I have now merged the missing code identified by TurfIt; thank you for that.

For reference, we currently have the following values set in simuconf.tab on the Bridgewater-Brunel server:

server_frames_behind = 4
additional_client_frames_behind = 4
server_frames_per_step = 16
server_frames_between_checks = 1

TurfIt - do you recommend modifying any of those settings to test whether they improve performance/responsiveness? I wonder whether the additional_client_frames_behind setting is necessary given that we now have the timing synchronisation code?

Offline TurfIt

  • Dev Team, Coder/patcher
  • Devotee
  • *
  • Posts: 1395
Re: Performance analysis
« Reply #32 on: September 08, 2020, 05:03:24 PM »
additional_client_frames_behind is a client side setting. http://bridgewater-brunel.me.uk/downloads/nightly/packages/Simutrans-Extended-Complete.zip has it set to 0. I would suggest 4 is a good default, but it's really up to each client to set for their circumstances. Getting r8348 applied should help too.

Ideally server_frames_per_step would be no more than 8 with a 15 fps setting, but need more cpu power for that.
And frames_between_checks =1 is only for desync tracking. Once that's settled, = 150 would be fine. i.e. a check every 10 secs. That also allows clients to more smoothly adjust their timing instead of being 'jerked' rudely about by the servers timing every frame.
( and remove / temp disable the rands, debug_sums in the checklists)

Offline jamespetts

  • Simutrans-Extended project coordinator
  • Administrator
  • *
  • Posts: 20342
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Performance analysis
« Reply #33 on: September 08, 2020, 05:07:07 PM »
Interesting, thank you for that. I will monitor the loss of synchronisation situation over the coming few days to see whether any that remain are so rare as not to need any action: one with a different apparent cause has been reported recently.

The idea of setting server_frames_per_step to 16 was to allow for a higher frame-rate for the client without making the game undertake more hard computations within any given time than necessary, giving more apparent smoothness by allowing an increase in the framerate. I am not sure how well that this has worked, however, as I have had to reduce the framerate to 15 in any event. Would you recommend changing server_frames_between_steps to 8?

As for the server_frames_behind, is there a need for this now that we have the time synchronisation code?

Offline TurfIt

  • Dev Team, Coder/patcher
  • Devotee
  • *
  • Posts: 1395
Re: Performance analysis
« Reply #34 on: September 08, 2020, 05:26:47 PM »
Unless you can overclock your server by ~20% more, there's no room to set frames_between lower. You could probably go to 15 to match the 15 fps and result in simloops==1, but the server is maxed.

The client frame timing code needs some breathing room to work, hence running a total 8 behind is good (4 from server setting, 4 from client). If the client runs ahead faster than the server, and hits the barrier, the client player will see a quite noticeable pause/hiccup happening on a regular basis. I find that nauseating.

Also note my suggested revision to the default iteration limits in a post above...

Offline jamespetts

  • Simutrans-Extended project coordinator
  • Administrator
  • *
  • Posts: 20342
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Performance analysis
« Reply #35 on: November 18, 2020, 05:32:07 PM »
I note that some people have reported performance problems on the Bridgewater-Brunel server. I have considered whether excessive population growth is the issue, and collected the following data.

In 1777, the population of the Bridgewater-Brunel region was recoded as 1,629,866.
In 1856, the population was 2,932,966.
In 1943, the population is 4,062,888.

Between 1777 and 1856, the population therefore grew by ~80%. Between 1856 and 1943, the population grew by a further ~40%. In 1943, the population was ~2.5 times what it was in 1777.

However, according to this website, this level of population growth is actually less than the historical population growth of England during this time period; so curtailing growth is not the way forward.

Testing myself, logging in seems to take a while, but, once logged in, the performance seems reasonable, albeit subject to the unevenness described by TurfIt above. I have carried out tests in the form of building public roads and replacing the old Kirkington Mount Hackney carriage service with a modern minibus, and everything seemed to be reasonably responsive.

Can I ask people to elaborate on the nature of the performance issues that people are experiencing?

Offline Freahk

  • Devotee
  • *
  • Posts: 1355
  • Languages: DE, EN
Re: Performance analysis
« Reply #36 on: November 18, 2020, 07:12:16 PM »
When I first had a look at the performance, it turned out that the vast majority of performance is spent on destination lookup of precalculated routes.
Looking at the code, this indicates that the global population as well as the number of stops in the world have a huge impact on this section of code.
More population will increase the number of lookups in two ways (more journey attempts, more max retries per journey attempt.)
More stops will mainly blow the memory by square, but will also significiantly increase lookup times once those tables grow large.

Keep in mind, that our simutrans hashtable implementation makes use of a fixed number of buckets, where items put into the same bucket are organised as a linked list. That means, it has a quite fast, roughly constant performance up to some point, but won't scale quite well beyond that point. When many items are inserted in that hashtable, its performance characteristics will rather be like a linked list than a hash structure.

Anyways, that observation was made when my 11 year old PC struggled in keeping up with the server, which was somewhere around the 1800s. Things might have changed in the meantime, but I won't spend any time on such observations before February.

It might be worth to have a detailled look at this, however.

In any case, these structures make a huge part of the memory consumption, so optimising these structures is mandatory to maintain acceptable performace in huge worlds (in terms of number of stops)
« Last Edit: November 18, 2020, 07:44:40 PM by Freahk »

Offline jamespetts

  • Simutrans-Extended project coordinator
  • Administrator
  • *
  • Posts: 20342
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Performance analysis
« Reply #37 on: November 18, 2020, 07:57:19 PM »
That is an interesting observation - thank you. As discussed earlier, I believe, I did try to replace this with std::unordered_map, but found performance to be significantly worse. I really am not skilled at low-level programming, and I believe that I should struggle to optimise the in-built Simutrans structures by any significant change to the design.

However, I notice reference to a fixed number of buckets. I wonder whether anything might be gained by increasing this fixed number in respect of hashtables that we can predict at the time of their creation in memory are likely to be very large? If there is a real chance of this strategy being successful, it might be simple enough for me to implement this.

Offline Freahk

  • Devotee
  • *
  • Posts: 1355
  • Languages: DE, EN
Re: Performance analysis
« Reply #38 on: November 19, 2020, 12:54:13 AM »
In principle yes, but I suspect I confused it with private car route lookup

The precalculated journey times are not stored in a hastable but in a matrix (2d array), which is very fast and efficient.

Anyays, the other parts remain intact.
The number of inhabitants will increase journey attempts by square and the number of stops will increase the memory consumption of the precalculated routing data by square.
On top of this, a dense stop placement will drastically increase the number of origin-destination stop pair candidates. Each of these pairs needs to be observed to find the fastest route from a specific origin to a specific destination.

There are still some options to decreasing the number of lookups, thus increase the performance, but these are not straight-forard.
Just the draft of an idea I had posted when discussing about poor passenger success rates:
Define a heuristics that systematically underestimates journey times and use it to filter out destination candidates.

Concretely, in the path explorer run, extract the best-case (manhattan) distances that can be reached from a specific town within a small set of times, using the origin-destination time matrix. For example store the best-case distances that can be reached in 8, 16, 32, 64, 128, 256, 512 and 1024 minutes.

Store that information somewhere (a few bytes per town really don't hurt anyone)

When a passenger makes a journey attempt, randomly select destination candidates.
Select the best-case distance of the smallest journey time greater than the time budget.
Compare that distance against the (manhattan) distance from origin to destination. If the best-case distance is smaller than our actual distance, we can immediately reject that destination, otherwise we will have to look up all origin-destination pairs as-is.



Sorry about the confusion in the very last section. By "these structures" I was not refering to the hashtable implementation, but to the way how routes from one stop to all stops is stored, which is the reason why the memory consumption of the routing data grows by square with the number of stops.
In principle, it is not neccessary to store all origin-destination pairs. If we cut the world into some kind of areas, only all pairs within a single area as well as pairs between all hubs need to be stored explicitly, where a hub is any stop that has at least one service (includes walking) which leaves the area.
That way, the memory consumption could be drastically reduced at cost of a two-level lookup. This won't increase the performance though.
How exactly those areas are defined is a different thing. Choosing these well will lead to better results, but even a rather simple approach of fixed rectangles, let's say 1/10 of maps size should already do a good job.

Offline Matthew

  • *
  • Posts: 439
    • Japan Railway Journal
  • Languages: EN, some ZH, DE & SQ
Re: Performance analysis
« Reply #39 on: November 19, 2020, 06:13:07 AM »
Quote
Can I ask people to elaborate on the nature of the performance issues that people are experiencing?

Client lag

This was the issue that troubled me earlier this year. For example, I would click on a carriage in the depot and there would be a delay before it was added to the train. Once lag started, usually when someone else joined, it accelerated until either the game became unplayable (which I define as lag >= 2 minutes) or I desynced. This issue has dramatically improved for me since then. I don't often have such lags and the client does sometimes recover from them. They are still correlated with other players joining, so I don't think it's just because my VM is slow (though that is obviously a factor), but is also somehow affected by the save-unload-load process.

While the situation 'works for me' at the moment, I have three further thoughts.

Firstly, I think one of the changes in the summer was to increase the frame time, making the graphical experience worse for everybody. I am grateful to all the other players for putting up with this so that I can play. Thank you!

Secondly, Freahk's experience suggests that this may become an issue again as the savegame continues to grow, but this has not happened yet. I thought that I might reach a tipping point when the game could not longer fit into RAM, but recently I've had up to 3GB of the game stored in the page file and the client still runs. 

Thirdly, I can often work around this issue by quitting the client and rejoining the game. But I think that other players are sometimes desynced when I do this. And this option is only possible for me because I have fast Internet. I am conscious that this is impractical for anyone with a slow Internet connection (g'day Australia!). This leads us to the next issue.

Server preparation time

Transferring games and loading them is not a big issue for me (especially now I know to disable Windows' antivirus package, which was doubling load times). But the "Server preparing game..." element seems to be growing longer. It took eight and then ten minutes during my last two joins. It seems to be very low when the server restarts, but then basically inexplicable thereafter (the preparation time does not scale consistently with server uptime).

This issue can make the server difficult to play at times when others are joining. If you sit down to play for an hour and two others join during that time, then you may lose half of that time to server prep. And probably one of those joins will fail and have to be restarted. So you spend more time waiting than playing. #firstworldproblems

Server lag

This is the phenomenon listed in the server log as "server lagging by", which appears to correlate with desyncs and crashes. I don't fully understand what it means for the server to lag. It can't mean that the server is behind the client(s), because then it correlate with the client lag, which is does not. Perhaps it means that the server is behind the target introduced by the changes in the summer?

I wonder whether the variability of the last two issues is related to the state of the Path Explorer at the time when the game is saved. But that is just a wild guess. There are many other possibilities.

I note that some people have reported performance problems on the Bridgewater-Brunel server. I have considered whether excessive population growth is the issue, and collected the following data.

In 1777, the population of the Bridgewater-Brunel region was recoded as 1,629,866.
In 1856, the population was 2,932,966.
In 1943, the population is 4,062,888.

Between 1777 and 1856, the population therefore grew by ~80%. Between 1856 and 1943, the population grew by a further ~40%. In 1943, the population was ~2.5 times what it was in 1777.

However, according to this website, this level of population growth is actually less than the historical population growth of England during this time period; so curtailing growth is not the way forward.

I will respond to this in a separate thread.

Offline jamespetts

  • Simutrans-Extended project coordinator
  • Administrator
  • *
  • Posts: 20342
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Performance analysis
« Reply #40 on: November 21, 2020, 12:33:58 PM »
Thank you both for your responses. The server preparation time is the time that the server spends saving the game; I do note that this time seems to be excessive for reasons which are not immediately clear at present, but investigating this will be a truly gargantuan task, especially as the conditions in which the excessive time occurs are not readily reproducible other than on a server, which makes getting a reliable reproduction case that can be reproduced many times over in quick succession for performance profiling extremely difficult.

One might try running a local server (start Simutrans-Extended with the "-server" command line parameter) to test this, but having enough memory to run two instances of the current Bridgewater-Brunel saved game is likely to be a challenge for many. Alternatively, one might simply test whether saving this game several times over increases save time dramatically; if it does, this would give a much easier reproduction case for performance profiling. As this latter test would be simple but time-consuming, it would be very helpful if someone could try this and report back the results.

In principle yes, but I suspect I confused it with private car route lookup

The precalculated journey times are not stored in a hastable but in a matrix (2d array), which is very fast and efficient

Thank you for clarifying: that is helpful. The difference, I think, comes from the fact that the path explorer was written by an especially skilled developer (Knightly), whereas I wrote the private car routing system, and I have less skill in programming, especially the sort of lower-level programming involved in creating a reliable matrix.

Quote

There are still some options to decreasing the number of lookups, thus increase the performance, but these are not straight-forard.
Just the draft of an idea I had posted when discussing about poor passenger success rates:
Define a heuristics that systematically underestimates journey times and use it to filter out destination candidates.

Concretely, in the path explorer run, extract the best-case (manhattan) distances that can be reached from a specific town within a small set of times, using the origin-destination time matrix. For example store the best-case distances that can be reached in 8, 16, 32, 64, 128, 256, 512 and 1024 minutes.

Store that information somewhere (a few bytes per town really don't hurt anyone)

When a passenger makes a journey attempt, randomly select destination candidates.
Select the best-case distance of the smallest journey time greater than the time budget.
Compare that distance against the (manhattan) distance from origin to destination. If the best-case distance is smaller than our actual distance, we can immediately reject that destination, otherwise we will have to look up all origin-destination pairs as-is.

This an interesting idea, but the heuristic implicitly assumes a maximum speed, and it is not clear how one would calculate this. One could in theory simply use the maximum speed of the fastest vehicle in the game, but this would then mean that, after 1977 when Concorde is introduced, performance would greatly worsen as a larger number of candidates. Also, I am not entirely sure that I understand why these data are stored per town (as the distance reachable in any given time is universal), unless I have misunderstood how this algorithm is intended to work?

Quote
Sorry about the confusion in the very last section. By "these structures" I was not refering to the hashtable implementation, but to the way how routes from one stop to all stops is stored, which is the reason why the memory consumption of the routing data grows by square with the number of stops.
In principle, it is not neccessary to store all origin-destination pairs. If we cut the world into some kind of areas, only all pairs within a single area as well as pairs between all hubs need to be stored explicitly, where a hub is any stop that has at least one service (includes walking) which leaves the area.
That way, the memory consumption could be drastically reduced at cost of a two-level lookup. This won't increase the performance though.
How exactly those areas are defined is a different thing. Choosing these well will lead to better results, but even a rather simple approach of fixed rectangles, let's say 1/10 of maps size should already do a good job.

Reducing memory consumption would, all things being equal, be a good thing, but the real problem is not memory consumption but memory bandwidth; if this would not improve memory bandwidth, it is unlikely ultimately to be worthwhile (unless this would reduce loading/saving times greatly, perhaps? But this would take a very, very large amount of work to implement for something the magnitude of benefit of which would be unknown until after all the work had been done).

Offline Freahk

  • Devotee
  • *
  • Posts: 1355
  • Languages: DE, EN
Re: Performance analysis
« Reply #41 on: November 21, 2020, 02:28:16 PM »
This an interesting idea, but the heuristic implicitly assumes a maximum speed, and it is not clear how one would calculate this.
We already got all the information we need, the only thing we need to do is to extract it from the matrix in the path explorer run.
The path explorer calculates all origin-destination stop pair routes, including the journey time.

All we need to do now is:
- identify which area the origin of such a pair belongs to.
- calculate the manhattan distance of that pair.
- we already calculated the journey time of that origin-destination pair, so we get it fror free.
- finally write this to the best_case table of the area.

One thing we might need to keep in mind is minimal anomalies caused by stops that can be reached from two different areas.
We might argue this is acceptable, or alternatively ensure that we consider all stops that can be reached (by walking) from each area.

if this would not improve memory bandwidth, it is unlikely ultimately to be worthwhile (unless this would reduce loading/saving times greatly, perhaps?
Memory consumption of large simutrans-ex games is an issue in many ways:
Savegame size, thus loading times and server connection times. It can take a huge amount of time to connect to a huge simutrans-extended server, depending on the internet connection.

Purely memory consumption. There is a saying "ram can only be replaced by more ram".
We have quite often observed servers and clients running out of ram in the past.
From recent observations on the pak192.comic for extended test server, this causes a long pause after the client successfully loaded the game. Assinging more ram to the server solved this.
The same was repeatedly reported on Bridgewater, we just cannot increase the memory to test if this solves the issue.
Swapping memory out will drastically decrease performance, whenever memory not located in ram is accessed, so reducing memory consumption is at least worth considering.

Anyways, I agree this is not a trivial change, so it needs some considerations if it's worth the effort.

Offline jamespetts

  • Simutrans-Extended project coordinator
  • Administrator
  • *
  • Posts: 20342
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Performance analysis
« Reply #42 on: November 21, 2020, 03:02:19 PM »
Thank you for your thoughts on this. I am not clear at present on the conceptualisation of areas for these purposes. I can understand that there may well be considerable advantage in a system that runs a distance and speed based sanity check to see whether it is even possible for any given journey time to be within the passengers' tolerance before checking this in the database. In a simple implementation, one would first check the walking time to each possible origin stop plus the transfer time of that origin stop: if that figure exceeds the passengers' maximum journey time tolerance, there is then no need to check any routes from that stop.

The next layer would be for each stop to store its lowest journey+waiting time pair for a direct connexion. That can then be added to the above timings and, again, if the minimum possible journey time from a stop plus the walking time to the stop and the transfer time within a stop already equals or exceeds the passengers' maximum journey time, there is no need to check the database for routes from that stop.

The next layer would be to calculate a maximum implicit speed for any given point to point timing in the whole map when undertaking one of the phases of the path explorer (and, probably, do this per class). If the journey distance from the stop at this speed plus the walking time to the stop and the stop transfer time is in excess of passengers' maximum journey time tolerance, then there is again no need to check the database of routes from that stop.

It is in respect of the last layer that some more localised system might in principle allow for more precise computation (i.e. different stops having different implicit maximum speeds), but I am currently unclear on how this would actually work in your proposed system. How would areas be defined, and how would the maximum implicit speeds for those areas be calculated? I am not clear at present how using areas would be an improvement on a global calculation. I can understand that it would be good if there were a way of doing this per stop, but I cannot see how this can be done without consuming more performance than it will save.

As to memory consumption, I can see that reducing the size of the database of path origin/destination points would be very helpful. However, the technical complexities of changing the existing path explorer system fundamentally are almost beyond imagination: it was written by somebody whose ability with lower-level programming far exceeds mine and I am extremely doubtful that I should be able to produce a workable, reliable alternative to this. There would have to be a vast amount of testing and analysis to determine how best to segment the dataset (I am sceptical that doing it by geographical position would be optimum; it is likely to be better to do it, if this is workable at all, in a more abstract fashion based on some properties of the nodal map, one basic possibility being by player, but quite how best to do this would be extremely complex to determine), and also whether a multi-segmented database would actually be more efficient, since what would occur is that it would be necessary to make multiple calls to smaller databases instead of a single call to a larger database, and it is not at all clear whether this would be significantly slower than the current system or not.

In short, the database segmentation idea is possibly workable in theory, but whether it would be beneficial overall is uncertain and is probably beyond my ability and beyond what is possible for me to do in terms of time. The sanity check to determine whether the database needs to be accessed at all for any given stop in passenger generation is probably achievable and probably beneficial (although to what extent remains uncertain), but I am unclear as to the advantages, implementation or workability of an area system.

Offline Freahk

  • Devotee
  • *
  • Posts: 1355
  • Languages: DE, EN
Re: Performance analysis
« Reply #43 on: November 21, 2020, 05:38:08 PM »
The point is that very many passengers are generated with a rather low journey time budget, whilst destinations are picked globally, so it's rather unlikely for these to find a match, especially when the map is huge or the service is bad.

If we store best-case data to areas, for example towns, we can immediately reject many of these attempts without looking at any stop at all, which does especially mean not checking for all origin-destination stop pair candidates.

All that needs to be done is to calculate
best_case_travel_time_budget = journey_time_budget - walking_time_to_closest_origin station - walking_time_to_closest_destination_station.
Then select the best_case_distance in the origin area according to the best_case_travel_time_budget.
Finally calculate the manhattan distance in between origin and destination house and compare it against the selected best_case_distance.

If and how much gain in performance this will result in greatly depends on the map size and the service and nothing explicit can be said about that optimisation without testing it on a specific map.


If I understand currectly, the implementation you describe is different.
Using a per-stop-based heuristics will indeed result in better prediction, but at a much higher cost.
First, we need to check all origin stop candidates and we will need to perform a fair amount of calculations for each of these.
Those calculations might already be more expensive than looking up the origin-destination pair in the matrix.
The code itself is not pretty complex already. The reason why it causes such a huge load is that it is calles very frequently and has to check all origin-destination stop candidates, which in towns can very quickly be something like 100 pairs.

Offline jamespetts

  • Simutrans-Extended project coordinator
  • Administrator
  • *
  • Posts: 20342
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Performance analysis
« Reply #44 on: November 21, 2020, 05:47:16 PM »
I understand the basic principle of eliminating unnecessary database calls: in principle this is a good idea and not too difficult to implement, at least in a basic sense.

What I am struggling to understand (forgive me if this is my error) is what it means to compute a best case distance specifically for an area (as opposed to calculating a global maximum point to point speed and then using it to calculate the lowest possible journey time to the passengers' chosen destination), and how an area specific calculation would apply to an inter-area journey.

Also, if towns are used as areas, I am not clear on how journeys starting or ending outside a town would work with this system. Can you assist with this?

Thank you again for your thoughts on this.

Offline Freahk

  • Devotee
  • *
  • Posts: 1355
  • Languages: DE, EN
Re: Performance analysis
« Reply #45 on: November 21, 2020, 10:26:19 PM »
I don't know if it's your error that you're struggling to understand it or my error that I'm describing badly.
The question seems to be why I think a global best-case prediction is not a good idea.

The short answer is: I'd expect a global system to be near-useless, especially after the introduction of (hypersonic) aviation and maglevs. Somewhere in the world, there will be a connection that allows passengers to travel any distance within, let's say, an hour.

The long answer:
I have chosen towns as areas, because it's simple, that information can be retrieved quite quickly and it should be sufficient.

Usually, the biggest towns are quite well served and they are quite often the center of a hub-based network with commuter and long-distance trains as well as high-speed rail, (supersonic) jet aviation and maglev in later years.
That means, those few towns will be able to reach some destination at any distance within less than an hour, but the majority of towns won't.

Using the same heuristics everywhere on the map, that would effectively make the heuristics useless.
Using different heuristics on different areas

Specifically, computing a best-case distance for a specific area means:
1. Determine all stops that could serve as the origin stop for a housing in the origin town. That means any stop within city borders+stop coverage. I'll call these "areas stops".
2. Check the journey times from those stops to all destinations and update current areas best-case data accordingly.

Interurban journeys would work just the same as any other journey: Pick the best_case data of the origin area, pick the best_case distance according to the time budget from that data, compare it against the origin-destination distance.


Also, if towns are used as areas, I am not clear on how journeys starting or ending outside a town would work with this system. Can you assist with this?
I am not sure about this. From my understanding, the vast majority of journeys are generated from housings.
Usually housings belong to a town and any other buildings do not initiate journeys, except from rather rare onward trips. Return trips know their original origin anyways and it to me it doesn't make sense to reject return trips or to send them somewhere else, but I don't know if the code actually works like this.
Assuming that's how it actually works, it's not worth to cover that rather rare case. (onward) Journey attempt performance from outside of towns will simply remain the same as-is now.

Journeys ending inside or outside of areas doesn't matter in the above approach.
All that matters is where they originate.

Building an area-to-area best-case journey time matrix is not what the above is about, but it's also a promising approach.
The resulting heuristics will be better than the one described above at cost of more computations and memory.
Memorywise on BB, the size of those matrices will be negligible compared to the path matrices, even storing eight of those matrices and including the better heuristics, thus more immediately rejected attempts, could outperform the above anyways.
I'm not sure which one is to prefer. Both should come with a decent performance boost.
« Last Edit: November 21, 2020, 11:57:20 PM by Freahk »

Offline jamespetts

  • Simutrans-Extended project coordinator
  • Administrator
  • *
  • Posts: 20342
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Performance analysis
« Reply #46 on: November 22, 2020, 02:20:58 AM »
Specifically, computing a best-case distance for a specific area means:
1. Determine all stops that could serve as the origin stop for a housing in the origin town. That means any stop within city borders+stop coverage. I'll call these "areas stops".
2. Check the journey times from those stops to all destinations and update current areas best-case data accordingly.

Thank you for this clarification: this is helpful. I think that this summarises where the approach that I had imagined differs from the approach that you describe.

If I understand correctly, your approach involves, for each town, taking each stop in that town and computing the minimum journey time for a set of journeys each with a minimum distance (although what the origin point of that distance would be it not entirely clear) to any ultimate end point.

The approach that I had imagined involves calculating using per stop data for journey times to the next transfer, and using an implicit maximum speed (i.e. journey distance divided by journey time) and extrapolating from that to reach a minimum possible journey time from a given stop to anywhere. When overseas journeys are introduced (i.e. to off-map destinations), the maximum speeds for these journeys would be stored separately.

Using speed in this way would prevent the computations from being useless in the way that they would be for global journey times, would be considerably simpler to implement and would reduce the computational overhead of the background calculations necessary for this.

It is not currently clear to me that such advantages as there might be from the system that you envisage would have sufficient advantage to outweigh those disadvantages.

Offline jamespetts

  • Simutrans-Extended project coordinator
  • Administrator
  • *
  • Posts: 20342
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Performance analysis
« Reply #47 on: November 22, 2020, 01:47:29 PM »
I have now implemented what I described in an earlier post on this thread as the first layer of this suggestion, viz. not hitting the journey time database with a request if the walking time to the stop in question is greater than the journey time tolerance. I have also added a check to make sure that it is also not greater than a total journey time already found.

Performance testing with this shows a very significant improvement with this alone. Before the changes, running the Bridgewater-Brunel saved game from March 1944 shows path_explorer_t::compartment_t::get_path_between taking 20.54% of CPU time. After this change, it took 4.52% of CPU time in my test. step_passengers_and_mail_threaded took 55.14% before and 26.41% after, making it no longer the most CPU intensive part of the code (albeit only graphics parts of the code are higher).

Further testing with the simplest possible implementation of layer 2 (implicit speed checking), based on checking whether the implicit minimum average journey speed is less than 2,180km/h (the cruising speed of Concorde) shows path_explorer_t::compartment_t::get_path_between taking 4.08% of CPU time and step_passengers_and_mail_threaded taking 21.47% of time. The condition of a journey time exceeding this implicit speed was hit in many cases when testing on the Bridgewater-Brunel saved game with a debugger (being hit so frequently when running at normal speed that, when using a debugger, it would be immediately hit again each time after continuing), suggesting that this is making a real saving. It will be interesting to see how much that this saves when using more aggressive data generated from the game itself.

This suggests that the greatest savings can be made using simpler implementations and that diminishing returns are likely with more complex implementations.

However, the principle of this checking is a very good one and has made a great deal of difference to the performance. Thank you for suggesting this.

I have now pushed the very crude version of the second layer described above, and I should be interested to see whether this makes a difference to players' experiences of the game on Monday.

Offline Freahk

  • Devotee
  • *
  • Posts: 1355
  • Languages: DE, EN
Re: Performance analysis
« Reply #48 on: November 22, 2020, 08:36:00 PM »
If I understand correctly, your approach involves, for each town, taking each stop in that town and computing the minimum journey time for a set of journeys each with a minimum distance to any ultimate end point.
Yes.
There's a detail that still differs: I'd store the maximum possible distance to a set of predefined times, instead of the other way round.
This allows to make use of some iformation we have about the journey time distribution and saves a few lookups as the origin and the tolerable time remain constant.

Quote
(although what the origin point of that distance would be it not entirely clear)
When calculating the best-case data, the distance in between stops is taken into account.
When validating a journey, the distance in between the housings is taken into account.
This makes sense, because the very best starting position we can get is directly on the stop, where the route of recorded best-case distance started.
This can only violate the heuristics property of not rejecting anything that can be reached, if the recorded best-case distance decreases when the journey time increases, which doesn't make sense.

Using speed in this way would prevent the computations from being useless in the way that they would be for global journey times, would be considerably simpler to implement and would reduce the computational overhead of the background calculations necessary for this.
I do not agree with the last two points.
Currently this seems to be true, as you simply use the concorde speed as a placeholder, but if you want to use a per-stop-best-case-average-speed (and I highly expect this to be worthwile), the calculations to gather that information won't differ pretty much from the calculations to gather per-area-best-case-distances in both, computational load, as well as code complexity.


As mentioned, per-origin-stop data involves iterating all possible origin stops, which means it scales linearly with the number of stops per area.
I aimed at a heuristics that performs in constant time and makes use of simple calculations to rejects journeys very quickly.
The area-based heuristics itself will be much faster than the stop-based heuristics.
Anyways, it's hard to tell which of these two will result in better overall performance, as a slower heuristics rejecting more lookups can be much better than an extremely fast heuristics that rejects fewer lookups.

In any case,either of these heuristics should be much better than no heuristics at all, which is impressivley indicated by the numbers you provided.


Thank you for the first very basic implementation. The results seem quite promising on the BB server game.
I want to note that the high impact by only the walking time consideration is expected to be due to rather large stop coverage areas, thus long walking times to a fair amount of stops, as well as the tolerable journey time figure, which generates very many passengers with very low journey times.
It will be very interessting to see how performance improves even further when
a) using per-stop best-case speed extracted from the path matrix
b) using multiple per-stop best-case speeds, accoring to the distance, as a one hour journey, where half an hour is spent in a concorde and half an hour is spent in a bus, will still reult in a best-case-average-speed of roughly 1000 km/h stored in many bus stops, around that airport, whilst the best-case-average speed at local distances is much lower.
« Last Edit: November 22, 2020, 08:56:16 PM by Freahk »

Offline jamespetts

  • Simutrans-Extended project coordinator
  • Administrator
  • *
  • Posts: 20342
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Performance analysis
« Reply #49 on: November 23, 2020, 02:21:37 PM »
I have now implemented the next layer of this, just pushed to the master branch. In this layer, we take the maximum vehicle speed for all vehicles in the game (recalculated on loading a game and monthly), albeit with a minimum of one third of the fastest available vehicle up to a speed of 250km/h for ground vehicles and up to 895km/h for air vehicles, and using the maximum speed of available ground vehicles where there are no vehicles in the game, and use that to check the implicit speed required. Further, the maximum speed of aircraft are stored separately, and are excluded wherever the passengers' journey time tolerance is less than the minimum transfer time for airports, the ground vehicle timings being used instead.

The requisite implicit speed is calculated now twice: once overall to see whether even to consider looking for origin stops, and again individually for each origin stop, adjusted to take into account the walking time to the particular origin stop and the distance from that origin stop to the destination.

Testing shows a further considerable improvement. Using the Bridgewater-Brunel saved game from March 1944, karte_t::generate_passengers_or_mail() now records a total of 7.45% CPU time, and path_explorer_t::compartment_t::get_catg_path_between now takes only 0.93% of CPU time.

These savings are significant enough that the much greater effort of implementing any more sophisticated system than this is unlikely to be worthwhile owing to diminishing returns.

Offline jamespetts

  • Simutrans-Extended project coordinator
  • Administrator
  • *
  • Posts: 20342
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Performance analysis
« Reply #50 on: November 23, 2020, 02:37:49 PM »
Briefly investigating memory consumption on the Bridgewater-Brunel saved game from March 1944, we find that it takes circa 9.6GB in RAM.

Of that, the greatest amount by far is allocated by planquadrat_rdwr() (i.e. the method for loading the individual game tiles), which allocates ~5.6GB. Next comes the path explorer, which allocates 1.065GB. Then comes karte_t::init_tiles, which allocates a further 880MB, and then haltestelle_t::create (stop creation code after loading), which allocates a further 553MB. Initialising the path explorer takes another ~162MB, and the convoys take a further ~101MB. Everything else is <100MB.

Thus, if we wish to improve memory consumption, we need to focus on ways of reducing per tile memory consumption.

Offline jamespetts

  • Simutrans-Extended project coordinator
  • Administrator
  • *
  • Posts: 20342
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Performance analysis
« Reply #51 on: November 23, 2020, 03:51:51 PM »
Incidentally, in relation to memory consumption and load times, last year, with the previous Bridgewater-Brunel saved game, it was found that, at a certain point of network complexity, the time that it took the server to save the game became extreme and rendered the game unplayable. Investigation discovered that this was caused by the compression algorithm for some reason not working well with the path explorer data and processing it extremely slowly.

TurfIt investigated replacing this algorithm with a newer (open source) algorithm, and found this to be much faster. I looked into implementing this, but this was not followed through because (1) the server map density was going to be downsized significantly for the next (this current) game in any event, so it was not clear whether this would be necessary; and (2) this has now, I believe, been implemented into Standard, so it was considered better simply to wait for the feature to be ported from Standard when whoever was working on that project got that far.

Given that we are now having this problem again, it seems, I wonder whether there might be some merit in looking into expediting porting this specific feature from Standard (if I recalled correctly that it was integrated into Standard in the first place)?

Online freddyhayward

  • Devotee
  • *
  • Posts: 456
  • Languages: EN
Re: Performance analysis
« Reply #52 on: November 23, 2020, 10:29:26 PM »
Although it isn't strictly a problem, public transport passenger numbers on bridgewater-brunel have globally declined ~10% since this change.

Offline jamespetts

  • Simutrans-Extended project coordinator
  • Administrator
  • *
  • Posts: 20342
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Performance analysis
« Reply #53 on: November 23, 2020, 10:40:25 PM »
Although it isn't strictly a problem, public transport passenger numbers on bridgewater-brunel have globally declined ~10% since this change.

I discovered possible anomalies in the implementation in the current nightly build which I believe that I have rectified in the current version, which will be deployed in to-morrow's nightly build. These are a possible (but not certain) cause of what you report.