News:

Want to praise Simutrans?
Your feedback is important for us ;D.

Simulation performance profile

Started by TurfIt, April 03, 2014, 02:26:43 AM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

TurfIt

From - Long-term thoughts on multi-threading in Standard and Experimental:
Quote from: jamespetts on March 31, 2014, 03:51:09 PM
but I think that there are some areas, in Experimental at least, where this is so (passenger generation, private car route finding, and possibly certain parts of the goods route-finding, being the initial construction of the table of routes and times from stops to directly connected stops).
How did you decide those areas are deserving of multithreading? You wouldn't gain much in those areas IMHO. See the big table below for my profiling results. Save files from bridgewater-brunel, and Carls UK map. If you've any other nice big ones, send them over.

sync_step(): syn is 95% due to convoi_t::calc_acceleration(). i.e. the physics. And remember this is every frame.
step(): is breakdown of the different sections in karte_t::step(). This is per simloop. Nominally 5 times per second.
cites:step(): is a further breakdown for stadt_t::step().
convoi:step() is a further breakdown for convoi_t::step(). Those loading times were ~10x larger pre my patch.
Times are average ms per sync_step() or per step(). I also have min/max times for step(), but the forum table would be unreadable.
One problem in step() is season, path_explorer, and private_cars all spend most of their time doing nothing, but then tend to all start calculating at once. It would be best to serialize these as the cumulative effect is an extra 30-80 ms per step while they're running. This is a huge increase over the averages, and give a very noticeable slowdown.

game1-final-1956game2-1832game-finalcurrent-gameUK2014-05
sync_step():                                                                           
   eye_c      0.00      0.00%      0.00      0.00%      0.00      0.00%      0.00      0.00%      0.00      0.00%      0.02      0.73%   
   way_eye      0.03      0.74%      0.07      0.68%      0.02      0.20%      0.03      0.29%      0.00      0.00%      0.01      0.36%   
   syn      4.05      99.26%      10.22      99.32%      9.91      99.80%      10.18      99.61%      7.20      100.00%      2.72      98.91%   
   total      4.08      0.00%      10.29      0.00%      9.93      0.00%      10.22      0.10%      7.20      0.00%      2.75      0.00%   
step():                                                                           
   month      0.01      0.05%      0.01      0.07%      0.01      0.29%      0.01      0.18%      0.00      0.00%      0.02      0.55%   
   time      0.00      0.00%      0.00      0.00%      0.00      0.00%      0.00      0.00%      0.00      0.00%      0.00      0.00%   
   season      0.05      0.26%      0.00      0.00%      0.01      0.29%      0.01      0.18%      0.00      0.00%      0.00      0.00%   
   path_explorer      14.65      75.28%      7.60      50.17%      0.03      0.86%      1.35      24.50%      0.10      0.67%      1.92      52.60%   
   convoi      3.77      19.37%      6.91      45.61%      3.10      88.57%      3.45      62.61%      12.90      86.58%      1.34      36.71%   
   private_cars      0.50      2.57%      0.00      0.00%      0.00      0.00%      0.00      0.00%      0.20      1.34%      0.12      3.29%   
   cities      0.35      1.80%      0.32      2.11%      0.26      7.43%      0.47      8.53%      1.20      8.05%      0.23      6.30%   
   factories      0.01      0.05%      0.08      0.53%      0.04      1.14%      0.09      1.63%      0.50      3.36%      0.00      0.00%   
   power      0.08      0.41%      0.07      0.46%      0.00      0.00%      0.09      1.63%      0.00      0.00%      0.00      0.00%   
   player      0.00      0.00%      0.00      0.00%      0.00      0.00%      0.00      0.00%      0.00      0.00%      0.00      0.00%   
   stations      0.04      0.21%      0.17      1.12%      0.05      1.43%      0.03      0.54%      0.10      0.67%      0.02      0.55%   
   total      19.46      0.00%      15.15      -0.07%      3.50      0.00%      5.51      0.18%      14.90      -0.67%      3.65      0.00%   
cities:step():                                                                           
   factories      0.01      2.20%      0.00      1.65%      0.00      2.01%      0.00      1.09%      0.00      0.37%      0.01      9.32%   
   calc_growth      0.04      12.16%      0.03      10.08%      0.03      13.02%      0.03      6.29%      0.02      2.17%      0.08      51.50%   
   growth      0.02      5.13%      0.04      12.18%      0.10      44.62%      0.02      3.48%      0.16      14.23%      0.02      10.40%   
   pax      0.25      77.43%      0.22      73.75%      0.09      37.47%      0.39      87.57%      0.94      82.68%      0.03      16.41%   
   total      0.32      3.07%      0.29      2.35%      0.23      2.88%      0.44      1.57%      1.14      0.55%      0.16      12.37%   
convoi:step():                                                                           
   loading      1.08      45.96%      4.75      72.08%      1.48      50.34%      2.04      60.53%      1.80      14.06%      0.53      40.15%   
   routing      0.67      28.51%      0.67      10.17%      0.52      17.69%      0.55      16.32%      10.60      82.81%      0.46      34.85%   
   waiting      0.24      10.21%      0.38      5.77%      0.29      9.86%      0.37      10.98%      0.00      0.00%      0.20      15.15%   
   total      2.35      15.32%      6.59      11.99%      2.94      22.11%      3.37      12.17%      12.80      3.13%      1.32      9.85%   

Dwachs

How do you obtain the time measurements? Just by dr_time() ?
Parsley, sage, rosemary, and maggikraut.

Markohs

season numbers appear to ve very reasonable in your results. I know it takes time, just a lot of it, when the season changes, but on that results it doesn't look like a function that uses much CPU.

Thanks for taking your time to collect that data, turfit!

TurfIt

Quote from: Dwachs on April 03, 2014, 06:21:00 AM
How do you obtain the time measurements? Just by dr_time() ?
dr_time() isn't fast enough.. Used QueryPerformanceCounter() on Windows which has a 300kHz clock for me. And mach_absolute_time() for OSX (although these results are all from Windows).


Quote from: Markohs on April 03, 2014, 07:35:24 AM
season numbers appear to ve very reasonable in your results. I know it takes time, just a lot of it, when the season changes, but on that results it doesn't look like a function that uses much CPU.
Overall on games like these with high bits_per_month, the average usage drops to largely nothing. But when running, it is an extra 40ms or so per step.

jamespetts

Thank you very much for this analysis - it is most helpful. May I ask - which branch was this? If it is 11.x, it might be worthwhile rerunning this on the (latest) way-improvements branch, as I know that Bernd has done some optimisation work on that branch, but I do not know how effective that it is. Note that, on that branch, passenger generation is in karte_t rather than stadt_t.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

prissi

For acceleration: You could use lookup tables and carry over the non-matched part. (Or one newton iteration). Each convoi build a lookup table for its current weight and power on departure. Of course friction has to be accounted then not as weight change.

Or maybe in the calculation some lookup tables can be used. Never both to optimise something that does not take time.

Season change is multithreaded for single player. Only for multiplayer these needs to be in sync for tree spawning.

jamespetts

Prissi - thank you for your suggestions. I have passed them on to Bernd, who maintains the physics code.

I wonder whether there is a way to multi-thread seasons in multi-player: perhaps disable all tree spawning until the season change thread has completed?
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

prissi

The tree are only called every three month during season change to spawn. Outside this they do not consume time at all. A second iteration would certainly consume much more time. Most of the time the season change takes is because it is accessing part of the map which are rarely access and hence not cached.

But the way, you can save a lot of calculation time without tables for the acceleration. If a vehicle has reached its maximum allowed velocity (or it possible top speed for current load) it can drive with that speed until a curve or slope comes. No need to calculate the acceleration at all in this frequent case. Just drive on with that speed.

I will try a similar change for standard.

jamespetts

That is an interesting suggestion - thank you. I should add that a speed limit needs also to be considered, and, in Experimental, braking characteristics are such that it is necessary to know how far in advance of a speed limit to start braking, so this cannot be skipped entirely, but it might well be sufficient to reduce the amount of calculations significantly. I have already drawn Bernd's attention to this thread, so hopefully he will see your suggestion, too.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

TurfIt

Quote from: jamespetts on April 03, 2014, 11:23:42 AM
May I ask - which branch was this? If it is 11.x, it might be worthwhile rerunning this on the (latest) way-improvements branch,
It was 11.x. Results for 11.x and way-improvements below. game2-1832 couldn't be tested, it crashes upon loading in way-improvements. The change in the city growth routine is catastrophic.



game2-finalUK2014-05Client6-network
11.xway-impr11.xway-impr11.xway-impr
sync_step():------|------|---|||------|------|---|||------|------|---|||
eye_c0.000.00%|0.000.00%|---|||0.040.85%|0.041.19%|---|||0.000.00%|0.020.21%|---|||
way_eye_c0.060.34%|0.060.69%|---|||0.020.43%|00.00%|---|||0.000.00%|00.00%|---|||
syn17.3599.66%|8.6799.31%|50%|||4.6298.72%|3.3298.81%|28%|||12.07100.00%|9.6999.79%|20%|||
total17.410.00%|8.730.00%|50%|||4.680.00%|3.360.00%|28%|||12.070.00%|9.710.00%|20%|||
step():------|------|---|||------|------|---|||------|------|---|||
month0.010.11%|0.010.12%|---|||0.030.47%|00.00%|---|||0.010.04%|0.010.02%|---|||
time0.000.00%|0.000.00%|---|||0.000.00%|0.000.00%|---|||0.000.00%|0.000.00%|---|||
season0.010.11%|0.030.35%|-200%|||0.010.16%|0.040.45%|-300%|||0.050.21%|0.130.32%|-160%|||
path_explorer2.0722.87%|2.5129.02%|-21%|||3.0948.89%|2.6530.01%|14%|||0.140.60%|0.150.37%|-7%|||
convoi5.8564.64%|3.3138.27%|43%|||2.6541.93%|5.4461.61%|-105%|||20.1086.30%|19.2046.80%|4%|||
private_cars0.000.00%|0.000.00%|---|||0.132.06%|0.121.36%|8%|||0.251.07%|0.501.22%|-100%|||
cities0.748.18%|2.0423.58%|-215%|||0.365.70%|0.111.25%|11%|||1.837.86%|19.6147.79%|-993%|||
pax------|0.283.24%|---|||------|0.192.15%|---|||------|0.390.95%|---|||
inhab------|0.010.12%|---|||------|0.020.23%|---|||------|0.010.02%|---|||
factories0.151.66%|0.161.85%|-7%|||0.010.16%|0.010.11%|0%|||0.793.39%|0.822.00%|-4%|||
power0.161.77%|0.141.62%|13%|||0.000.00%|0.000.00%|---|||0.000.00%|0.000.00%|---|||
player0.000.00%|0.000.00%|---|||0.000.00%|0.000.00%|---|||0.000.00%|0.000.00%|---|||
stations0.050.55%|0.050.58%|0%|||0.040.63%|0.141.59%|-250%|||0.120.52%|0.110.27%|8%|||
total9.050.11%|8.651.27%|4%|||6.320.00%|8.831.25%|-40%|||23.290.00%|41.030.24%|-76%|||
cities::step():------|------|---|||------|------|---|||------|------|---|||
factories0.011.28%|------|---|||0.0311.22%|------|---|||0.010.45%|------|---|||
growth0.068.17%|------|---|||0.1454.16%|------|---|||0.2614.76%|------|---|||
pax0.6288.69%|------|---|||0.0519.75%|------|---|||1.5184.12%|------|---|||
total0.701.87%|------|---|||0.2614.87%|------|---|||1.790.67%|------|---|||
convoi::step():------|------|---|||------|------|---|||------|------|---|||
loading3.4961.01%|1.0131.76%|71%|||0.8632.95%|3.6868.53%|-328%|||2.9214.64%|2.5013.16%|14%|||
routing0.8915.56%|0.8827.67%|1%|||0.9937.93%|0.7013.04%|29%|||16.4682.55%|15.8983.63%|3%|||
waiting0.6311.01%|0.6219.50%|2%|||0.5119.54%|0.6712.48%|-31%|||0.050.25%|0.060.32%|-20%|||
total5.7212.41%|3.1821.07%|44%|||2.619.58%|5.375.96%|-106%|||19.942.56%|19.002.89%|5%|||
game-1956game1-final
11.xway-impr11.xway-impr
sync_step():------|------|---|||------|------|---|||
eye_c0.000.00%|0.000.00%|---|||0.000.00%|0.000.00%|---|||
way_eye_c0.110.72%|0.120.88%|-9%|||0.060.86%|0.050.85%|17%|||
syn15.0899.21%|13.5799.12%|10%|||6.8899.14%|5.8299.15%|15%|||
total15.200.07%|13.690.00%|10%|||6.940.00%|5.870.00%|15%|||
step():------|------|---|||------|------|---|||
month0.010.07%|0.010.08%|0%|||0.020.07%|0.010.03%|50%|||
time0.000.00%|0.000.00%|---|||0.000.00%|0.000.00%|---|||
season0.020.14%|0.040.33%|-100%|||0.060.20%|0.130.43%|-117%|||
path_explorer3.9027.56%|2.7923.33%|28%|||22.5875.24%|23.6978.78%|-5%|||
convoi9.3566.08%|7.0058.53%|25%|||5.9919.96%|4.8216.03%|20%|||
private_cars0.000.00%|0.000.00%|---|||0.571.90%|0.702.33%|-23%|||
cities0.493.46%|1.4011.71%|-245%|||0.561.87%|0.110.37%|23%|||
pax------|0.282.34%|---|||------|0.311.03%|---|||
inhab------|0.010.08%|---|||------|0.010.03%|---|||
factories0.130.92%|0.121.00%|8%|||0.020.07%|0.020.07%|0%|||
power0.120.85%|0.100.84%|17%|||0.140.47%|0.110.37%|21%|||
player0.000.00%|0.000.00%|---|||0.000.00%|0.000.00%|---|||
stations0.140.99%|0.100.84%|29%|||0.070.23%|0.060.20%|14%|||
total14.15-0.07%|11.960.92%|15%|||30.010.00%|30.070.33%|0%|||
cities::step():------|------|---|||------|------|---|||
factories0.012.04%|------|---|||0.012.67%|------|---|||
growth0.0817.23%|------|---|||0.0713.94%|------|---|||
pax0.3577.82%|------|---|||0.4179.34%|------|---|||
total0.452.91%|------|---|||0.524.05%|------|---|||
convoi::step():------|------|---|||------|------|---|||
loading6.0769.13%|3.5555.21%|42%|||1.5241.42%|0.6119.61%|60%|||
routing1.0011.39%|1.0316.02%|-3%|||1.1029.97%|1.4145.34%|-28%|||
waiting0.546.15%|0.7411.51%|-37%|||0.3910.63%|0.4514.47%|-15%|||
total8.7813.33%|6.4317.26%|27%|||3.6717.98%|3.1120.58%|15%|||

Quote from: prissi on April 03, 2014, 10:05:19 PM
Season change is multithreaded for single player. Only for multiplayer these needs to be in sync for tree spawning.
There's no multithreading in step() ? ? ?

jamespetts

#10
Thank you very much - that is extremely helpful. I see that there are a number of improvements, but a great reduction in the performance of cities. Are you sure that this relates to growth? I have not made the changes to city growth that I was planning to make. I notice that you were not able to give a breakdown of the individual functions in cities::step() for the way-improvements branch, so I cannot see where this is coming from. I know that Neroden has changed a few things relating to clustering, but on the face of it, it is hard to see how that has such a huge impact. Do you think that this is the culprit?

Edit: It occurs to me that two factors might skew the way-improvement results somewhat. Firstly, the city growth: in the latest game particularly (I assume "Client6-network" in this chart), the city growth is far too high - if the issue with cities is indeed the growth (and this seems to be a probable candidate), then the impact of the growth routine would be disproportionate compared to a state in which city growth was at normal levels - assuming, that is, that the code in question that is taking the extra time is indeed relating to the calculation of where growth should go, rather than how much of it that there should be, although the latter is currently unchanged from Standard.

Secondly, the figure for the load on passenger generation is probably underrepresented, since the alternative destinations would need to be recalibrated manually to get a genuinely equivalent result. This is because the 11.x branch uses distance ranges, ensuring that passenger journeys that are likely to have a lower journey time tolerance are more likely to be given destinations where the journey time is within that tolerance. The new passenger generation code abolishes distance ranges, but the number of alternative destinations needs to be increased to compensate (the best way to do this is with the settings that are based on the population size rather than using fixed numbers), and this can greatly increase the number of times that the passenger generation algorithm is run.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

TurfIt

The city growth is the problem. cities::step() isn't broken down for way-improvement branch since growth is the only function left. It appears code was added to try expanding cities when adding buildings fail. It's all those extra loops from the retries that are killing it, especially when it still fails. Client6-network is from the current online game, and it seems to be hit the hardest by the extra retries since all the cities are hemmed in. When you do your rewrite of growth, I suggest taking a good look at the structure of the code here - just like convoi loading, it's setting up/tearing down large structures repeatedly - needlessly.

Speaking of convoi loading, look at the time in the UK game. Not sure where that's from. Perhaps some change in the convoi spacing routine? That game does use that feature disproportionately to the rest.

jamespetts

Ahh, I see: thank you. That is very helpful. I will bear in mind your comments when looking at renovating city growth.

As to convoy spacing, I do not think that there has been any change to this code from 11.x. Thank you for spotting that. Happily, it is a relatively small proportion of the overall time, but is still not optimal. I am not entirely clear on what the problem could be there.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

TurfIt

The convoi loading time jump in the UK game appears due to it not being a 'real' game. 11.x has convois leaving stops while mostly empty. Way-improvments is generating far more passengers, so there's actually more loading going on. However there are code anomalies - simvehikel.cc::load_freight_internal():

// New system: only merges if origins are alike.
// @author: jamespetts

if(ware.can_merge_with(tmp))
{
tmp.menge += ware.menge;
cnv->invalidate_weight_summary();
total_freight += ware.menge;
ware.menge = 0;
break;
}

This entire section of code just vanished on 2013.08.01 between 1fb2e7b718e3fd10abc8b3fee19ea38a8370672c and 550a1e17edc49d7b64f3f008de25a45b4a820ade. Neither commit shows the change, yet the code is there in one, and gone in the next.

jamespetts

I am not sure why it does not show in the commit, but the removal is deliberate: testing showed that this code was literally never executed (I put a breakpoint inside the "if" statement and ran a complicated map on fast forward for a very long time, and the breakpoint was not hit once).

The reason for this is that, because, for the revenue system, it is necessary to keep a track of passengers'/mail/goods' origins, and it is also necessary to keep a track of whether they are visiting or commuting passengers (if passengers), the test for merging became much stricter to such an extent that nothing ever actually passed it and the code was disused (and CPU time wasted with redundant checks to see whether it should be called).
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

TurfIt

With 11.x, the UK map is merging ~0.8% of packets. Not a lot, but any reduction in the number of ware packets floating around is good. And fixing the loop to actually iterate, 1.2% merge. I see to_factory was changed to is_commuting_trip in way_improvements - just a rename? or functional change? I suggest you try the merging again with way-improv, and if still none, then completely remove the loop since it does nothing as it currently sits...

jamespetts

Quote from: TurfIt on April 08, 2014, 01:29:14 AM
With 11.x, the UK map is merging ~0.8% of packets. Not a lot, but any reduction in the number of ware packets floating around is good. And fixing the loop to actually iterate, 1.2% merge. I see to_factory was changed to is_commuting_trip in way_improvements - just a rename? or functional change? I suggest you try the merging again with way-improv, and if still none, then completely remove the loop since it does nothing as it currently sits...

Hmm, interesting - my tests had shown no use at all. Are you sure that it is worth checking 98.8% of ware packets uselessly in order to merge the remaining 1.2%?

As to "to_factory" and "is_commuting", this is indeed a substantial functional change: commuting passengers can be heading to any sort of building that has jobs, including commercial and industrial city buildings, town halls, attractions, depots and station extension buildings.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.