Performance analysis

jamespetts · November 18, 2020, 05:32:07 PM

I note that some people have reported performance problems on the Bridgewater-Brunel server. I have considered whether excessive population growth is the issue, and collected the following data.

In 1777, the population of the Bridgewater-Brunel region was recoded as 1,629,866.
In 1856, the population was 2,932,966.
In 1943, the population is 4,062,888.

Between 1777 and 1856, the population therefore grew by ~80%. Between 1856 and 1943, the population grew by a further ~40%. In 1943, the population was ~2.5 times what it was in 1777.

However, according to this website, this level of population growth is actually less than the historical population growth of England during this time period; so curtailing growth is not the way forward.

Testing myself, logging in seems to take a while, but, once logged in, the performance seems reasonable, albeit subject to the unevenness described by TurfIt above. I have carried out tests in the form of building public roads and replacing the old Kirkington Mount Hackney carriage service with a modern minibus, and everything seemed to be reasonably responsive.

Can I ask people to elaborate on the nature of the performance issues that people are experiencing?

Mariculous · November 18, 2020, 07:12:16 PM

When I first had a look at the performance, it turned out that the vast majority of performance is spent on destination lookup of precalculated routes.
Looking at the code, this indicates that the global population as well as the number of stops in the world have a huge impact on this section of code.
More population will increase the number of lookups in two ways (more journey attempts, more max retries per journey attempt.)
More stops will mainly blow the memory by square, but will also significiantly increase lookup times once those tables grow large.

Anyways, that observation was made when my 11 year old PC struggled in keeping up with the server, which was somewhere around the 1800s. Things might have changed in the meantime, but I won't spend any time on such observations before February.

It might be worth to have a detailled look at this, however.

In any case, these structures make a huge part of the memory consumption, so optimising these structures is mandatory to maintain acceptable performace in huge worlds (in terms of number of stops)

jamespetts · November 18, 2020, 07:57:19 PM

That is an interesting observation - thank you. As discussed earlier, I believe, I did try to replace this with std::unordered_map, but found performance to be significantly worse. I really am not skilled at low-level programming, and I believe that I should struggle to optimise the in-built Simutrans structures by any significant change to the design.

However, I notice reference to a fixed number of buckets. I wonder whether anything might be gained by increasing this fixed number in respect of hashtables that we can predict at the time of their creation in memory are likely to be very large? If there is a real chance of this strategy being successful, it might be simple enough for me to implement this.

Mariculous · November 19, 2020, 12:54:13 AM

In principle yes, but I suspect I confused it with private car route lookup

The precalculated journey times are not stored in a hastable but in a matrix (2d array), which is very fast and efficient.

Anyays, the other parts remain intact.
The number of inhabitants will increase journey attempts linearly and the number of stops will increase the memory consumption of the precalculated routing data by square.
On top of that, a dense stop placement will drastically increase the number of origin-destination stop pair candidates. Each of these pairs needs to be observed to find the fastest route from a specific origin to a specific destination.

There are still some options to decreasing the number of lookups, thus increase the performance, but these are not straight-forard.
Just the draft of an idea I had posted when discussing about poor passenger success rates:
Define a heuristics that systematically underestimates journey times and use it to filter out destination candidates.

Specifically, in the path explorer run, extract the best-case (manhattan) distances that can be reached from a specific town within a small set of times, using the origin-destination time matrix. For example store the best-case distances that can be reached in 8, 16, 32, 64, 128, 256, 512 and 1024 minutes.

Store that information somewhere (a few bytes per town really doesn't hurt anyone)

When a passenger makes a journey attempt, randomly select destination candidates.
Select the best-case distance of the smallest journey time greater than the time budget.
Compare that distance against the (manhattan) distance from origin to destination. If the best-case distance is smaller than our actual distance, we can immediately reject that destination, otherwise we will have to look up all origin-destination pairs as-is.

Sorry about the confusion in the very last section. By "these structures" I was not refering to the hashtable implementation, but to the way how routes from one stop to all stops is stored, which is the reason why the memory consumption of the routing data grows by square with the number of stops.
In principle, it is not neccessary to store all origin-destination pairs. If we cut the world into some kind of areas, only all pairs within a single area as well as pairs between all hubs need to be stored explicitly, where a hub is any stop that has at least one service (includes walking) which leaves the area.
That way, the memory consumption could be drastically reduced at cost of a two-level lookup. This won't increase the performance though.
How exactly those areas are defined is a different thing. Choosing these well will lead to better results, but even a rather simple approach of fixed rectangles, let's say 1/10 of maps size should already do a good job.

Matthew · November 19, 2020, 06:13:07 AM

QuoteCan I ask people to elaborate on the nature of the performance issues that people are experiencing?

Client lag

This was the issue that troubled me earlier this year. For example, I would click on a carriage in the depot and there would be a delay before it was added to the train. Once lag started, usually when someone else joined, it accelerated until either the game became unplayable (which I define as lag >= 2 minutes) or I desynced. This issue has dramatically improved for me since then. I don't often have such lags and the client does sometimes recover from them. They are still correlated with other players joining, so I don't think it's just because my VM is slow (though that is obviously a factor), but is also somehow affected by the save-unload-load process.

While the situation 'works for me' at the moment, I have three further thoughts.

Firstly, I think one of the changes in the summer was to increase the frame time, making the graphical experience worse for everybody. I am grateful to all the other players for putting up with this so that I can play. Thank you!

Secondly, Freahk's experience suggests that this may become an issue again as the savegame continues to grow, but this has not happened yet. I thought that I might reach a tipping point when the game could not longer fit into RAM, but recently I've had up to 3GB of the game stored in the page file and the client still runs.

Thirdly, I can often work around this issue by quitting the client and rejoining the game. But I think that other players are sometimes desynced when I do this. And this option is only possible for me because I have fast Internet. I am conscious that this is impractical for anyone with a slow Internet connection (g'day Australia!). This leads us to the next issue.

Server preparation time

Transferring games and loading them is not a big issue for me (especially now I know to disable Windows' antivirus package, which was doubling load times). But the "Server preparing game..." element seems to be growing longer. It took eight and then ten minutes during my last two joins. It seems to be very low when the server restarts, but then basically inexplicable thereafter (the preparation time does not scale consistently with server uptime).

This issue can make the server difficult to play at times when others are joining. If you sit down to play for an hour and two others join during that time, then you may lose half of that time to server prep. And probably one of those joins will fail and have to be restarted. So you spend more time waiting than playing. #firstworldproblems

Server lag

This is the phenomenon listed in the server log as "server lagging by", which appears to correlate with desyncs and crashes. I don't fully understand what it means for the server to lag. It can't mean that the server is behind the client(s), because then it correlate with the client lag, which is does not. Perhaps it means that the server is behind the target introduced by the changes in the summer?

I wonder whether the variability of the last two issues is related to the state of the Path Explorer at the time when the game is saved. But that is just a wild guess. There are many other possibilities.

Quote from: jamespetts on November 18, 2020, 05:32:07 PM
I note that some people have reported performance problems on the Bridgewater-Brunel server. I have considered whether excessive population growth is the issue, and collected the following data.

In 1777, the population of the Bridgewater-Brunel region was recoded as 1,629,866.
In 1856, the population was 2,932,966.
In 1943, the population is 4,062,888.

Between 1777 and 1856, the population therefore grew by ~80%. Between 1856 and 1943, the population grew by a further ~40%. In 1943, the population was ~2.5 times what it was in 1777.

However, according to this website, this level of population growth is actually less than the historical population growth of England during this time period; so curtailing growth is not the way forward.

I will respond to this in a separate thread.

jamespetts · November 21, 2020, 12:33:58 PM

Thank you both for your responses. The server preparation time is the time that the server spends saving the game; I do note that this time seems to be excessive for reasons which are not immediately clear at present, but investigating this will be a truly gargantuan task, especially as the conditions in which the excessive time occurs are not readily reproducible other than on a server, which makes getting a reliable reproduction case that can be reproduced many times over in quick succession for performance profiling extremely difficult.

One might try running a local server (start Simutrans-Extended with the "-server" command line parameter) to test this, but having enough memory to run two instances of the current Bridgewater-Brunel saved game is likely to be a challenge for many. Alternatively, one might simply test whether saving this game several times over increases save time dramatically; if it does, this would give a much easier reproduction case for performance profiling. As this latter test would be simple but time-consuming, it would be very helpful if someone could try this and report back the results.

Quote from: Freahk on November 19, 2020, 12:54:13 AM
In principle yes, but I suspect I confused it with private car route lookup

The precalculated journey times are not stored in a hastable but in a matrix (2d array), which is very fast and efficient

Thank you for clarifying: that is helpful. The difference, I think, comes from the fact that the path explorer was written by an especially skilled developer (Knightly), whereas I wrote the private car routing system, and I have less skill in programming, especially the sort of lower-level programming involved in creating a reliable matrix.

Quote

There are still some options to decreasing the number of lookups, thus increase the performance, but these are not straight-forard.
Just the draft of an idea I had posted when discussing about poor passenger success rates:
Define a heuristics that systematically underestimates journey times and use it to filter out destination candidates.

Concretely, in the path explorer run, extract the best-case (manhattan) distances that can be reached from a specific town within a small set of times, using the origin-destination time matrix. For example store the best-case distances that can be reached in 8, 16, 32, 64, 128, 256, 512 and 1024 minutes.

Store that information somewhere (a few bytes per town really don't hurt anyone)

When a passenger makes a journey attempt, randomly select destination candidates.
Select the best-case distance of the smallest journey time greater than the time budget.
Compare that distance against the (manhattan) distance from origin to destination. If the best-case distance is smaller than our actual distance, we can immediately reject that destination, otherwise we will have to look up all origin-destination pairs as-is.

This an interesting idea, but the heuristic implicitly assumes a maximum speed, and it is not clear how one would calculate this. One could in theory simply use the maximum speed of the fastest vehicle in the game, but this would then mean that, after 1977 when Concorde is introduced, performance would greatly worsen as a larger number of candidates. Also, I am not entirely sure that I understand why these data are stored per town (as the distance reachable in any given time is universal), unless I have misunderstood how this algorithm is intended to work?

Quote
Sorry about the confusion in the very last section. By "these structures" I was not refering to the hashtable implementation, but to the way how routes from one stop to all stops is stored, which is the reason why the memory consumption of the routing data grows by square with the number of stops.
In principle, it is not neccessary to store all origin-destination pairs. If we cut the world into some kind of areas, only all pairs within a single area as well as pairs between all hubs need to be stored explicitly, where a hub is any stop that has at least one service (includes walking) which leaves the area.
That way, the memory consumption could be drastically reduced at cost of a two-level lookup. This won't increase the performance though.
How exactly those areas are defined is a different thing. Choosing these well will lead to better results, but even a rather simple approach of fixed rectangles, let's say 1/10 of maps size should already do a good job.

Reducing memory consumption would, all things being equal, be a good thing, but the real problem is not memory consumption but memory bandwidth; if this would not improve memory bandwidth, it is unlikely ultimately to be worthwhile (unless this would reduce loading/saving times greatly, perhaps? But this would take a very, very large amount of work to implement for something the magnitude of benefit of which would be unknown until after all the work had been done).

Mariculous · November 21, 2020, 02:28:16 PM

Quote from: jamespetts on November 21, 2020, 12:33:58 PMThis an interesting idea, but the heuristic implicitly assumes a maximum speed, and it is not clear how one would calculate this.

We already got all the information we need, the only thing we need to do is to extract it from the matrix in the path explorer run.
The path explorer calculates all origin-destination stop pair routes, including the journey time.

All we need to do now is:
- identify which area the origin of such a pair belongs to.
- calculate the manhattan distance of that pair.
- we already calculated the journey time of that origin-destination pair, so we get it fror free.
- finally write this to the best_case table of the area.

One thing we might need to keep in mind is minimal anomalies caused by stops that can be reached from two different areas.
We might argue this is acceptable, or alternatively ensure that we consider all stops that can be reached (by walking) from each area.

Quote from: jamespetts on November 21, 2020, 12:33:58 PMif this would not improve memory bandwidth, it is unlikely ultimately to be worthwhile (unless this would reduce loading/saving times greatly, perhaps?

Memory consumption of large simutrans-ex games is an issue in many ways:
Savegame size, thus loading times and server connection times. It can take a huge amount of time to connect to a huge simutrans-extended server, depending on the internet connection.

Purely memory consumption. There is a saying "ram can only be replaced by more ram".
We have quite often observed servers and clients running out of ram in the past.
From recent observations on the pak192.comic for extended test server, this causes a long pause after the client successfully loaded the game. Assinging more ram to the server solved this.
The same was repeatedly reported on Bridgewater, we just cannot increase the memory to test if this solves the issue.
Swapping memory out will drastically decrease performance, whenever memory not located in ram is accessed, so reducing memory consumption is at least worth considering.

Anyways, I agree this is not a trivial change, so it needs some considerations if it's worth the effort.

jamespetts · November 21, 2020, 03:02:19 PM

Thank you for your thoughts on this. I am not clear at present on the conceptualisation of areas for these purposes. I can understand that there may well be considerable advantage in a system that runs a distance and speed based sanity check to see whether it is even possible for any given journey time to be within the passengers' tolerance before checking this in the database. In a simple implementation, one would first check the walking time to each possible origin stop plus the transfer time of that origin stop: if that figure exceeds the passengers' maximum journey time tolerance, there is then no need to check any routes from that stop.

The next layer would be for each stop to store its lowest journey+waiting time pair for a direct connexion. That can then be added to the above timings and, again, if the minimum possible journey time from a stop plus the walking time to the stop and the transfer time within a stop already equals or exceeds the passengers' maximum journey time, there is no need to check the database for routes from that stop.

The next layer would be to calculate a maximum implicit speed for any given point to point timing in the whole map when undertaking one of the phases of the path explorer (and, probably, do this per class). If the journey distance from the stop at this speed plus the walking time to the stop and the stop transfer time is in excess of passengers' maximum journey time tolerance, then there is again no need to check the database of routes from that stop.

It is in respect of the last layer that some more localised system might in principle allow for more precise computation (i.e. different stops having different implicit maximum speeds), but I am currently unclear on how this would actually work in your proposed system. How would areas be defined, and how would the maximum implicit speeds for those areas be calculated? I am not clear at present how using areas would be an improvement on a global calculation. I can understand that it would be good if there were a way of doing this per stop, but I cannot see how this can be done without consuming more performance than it will save.

As to memory consumption, I can see that reducing the size of the database of path origin/destination points would be very helpful. However, the technical complexities of changing the existing path explorer system fundamentally are almost beyond imagination: it was written by somebody whose ability with lower-level programming far exceeds mine and I am extremely doubtful that I should be able to produce a workable, reliable alternative to this. There would have to be a vast amount of testing and analysis to determine how best to segment the dataset (I am sceptical that doing it by geographical position would be optimum; it is likely to be better to do it, if this is workable at all, in a more abstract fashion based on some properties of the nodal map, one basic possibility being by player, but quite how best to do this would be extremely complex to determine), and also whether a multi-segmented database would actually be more efficient, since what would occur is that it would be necessary to make multiple calls to smaller databases instead of a single call to a larger database, and it is not at all clear whether this would be significantly slower than the current system or not.

In short, the database segmentation idea is possibly workable in theory, but whether it would be beneficial overall is uncertain and is probably beyond my ability and beyond what is possible for me to do in terms of time. The sanity check to determine whether the database needs to be accessed at all for any given stop in passenger generation is probably achievable and probably beneficial (although to what extent remains uncertain), but I am unclear as to the advantages, implementation or workability of an area system.

Mariculous · November 21, 2020, 05:38:08 PM

The point is that very many passengers are generated with a rather low journey time budget, whilst destinations are picked globally, so it's rather unlikely for these to find a match, especially when the map is huge or the service is bad.

If we store best-case data to areas, for example towns, we can immediately reject many of these attempts without looking at any stop at all, which does especially mean not checking for all origin-destination stop pair candidates.

All that needs to be done is to calculate
best_case_travel_time_budget = journey_time_budget - walking_time_to_closest_origin station - walking_time_to_closest_destination_station.
Then select the best_case_distance in the origin area according to the best_case_travel_time_budget.
Finally calculate the manhattan distance in between origin and destination house and compare it against the selected best_case_distance.

If and how much gain in performance this will result in greatly depends on the map size and the service and nothing explicit can be said about that optimisation without testing it on a specific map.

If I understand currectly, the implementation you describe is different.
Using a per-stop-based heuristics will indeed result in better prediction, but at a much higher cost.
First, we need to check all origin stop candidates and we will need to perform a fair amount of calculations for each of these.
Those calculations might already be more expensive than looking up the origin-destination pair in the matrix.
The code itself is not pretty complex already. The reason why it causes such a huge load is that it is calles very frequently and has to check all origin-destination stop candidates, which in towns can very quickly be something like 100 pairs.

jamespetts · November 21, 2020, 05:47:16 PM

I understand the basic principle of eliminating unnecessary database calls: in principle this is a good idea and not too difficult to implement, at least in a basic sense.

What I am struggling to understand (forgive me if this is my error) is what it means to compute a best case distance specifically for an area (as opposed to calculating a global maximum point to point speed and then using it to calculate the lowest possible journey time to the passengers' chosen destination), and how an area specific calculation would apply to an inter-area journey.

Also, if towns are used as areas, I am not clear on how journeys starting or ending outside a town would work with this system. Can you assist with this?

Thank you again for your thoughts on this.

Mariculous · November 21, 2020, 10:26:19 PM

I don't know if it's your error that you're struggling to understand it or my error that I'm describing badly.
The question seems to be why I think a global best-case prediction is not a good idea.

The short answer is: I'd expect a global system to be near-useless, especially after the introduction of (hypersonic) aviation and maglevs. Somewhere in the world, there will be a connection that allows passengers to travel any distance within, let's say, an hour.

The long answer:
I have chosen towns as areas, because it's simple, that information can be retrieved quite quickly and it should be sufficient.

Usually, the biggest towns are quite well served and they are quite often the center of a hub-based network with commuter and long-distance trains as well as high-speed rail, (supersonic) jet aviation and maglev in later years.
That means, those few towns will be able to reach some destination at any distance within less than an hour, but the majority of towns won't.

Using the same heuristics everywhere on the map, that would effectively make the heuristics useless.
Using different heuristics on different areas

Specifically, computing a best-case distance for a specific area means:
1. Determine all stops that could serve as the origin stop for a housing in the origin town. That means any stop within city borders+stop coverage. I'll call these "areas stops".
2. Check the journey times from those stops to all destinations and update current areas best-case data accordingly.

Interurban journeys would work just the same as any other journey: Pick the best_case data of the origin area, pick the best_case distance according to the time budget from that data, compare it against the origin-destination distance.

Quote from: jamespetts on November 21, 2020, 05:47:16 PMAlso, if towns are used as areas, I am not clear on how journeys starting or ending outside a town would work with this system. Can you assist with this?

I am not sure about this. From my understanding, the vast majority of journeys are generated from housings.
Usually housings belong to a town and any other buildings do not initiate journeys, except from rather rare onward trips. Return trips know their original origin anyways and it to me it doesn't make sense to reject return trips or to send them somewhere else, but I don't know if the code actually works like this.
Assuming that's how it actually works, it's not worth to cover that rather rare case. (onward) Journey attempt performance from outside of towns will simply remain the same as-is now.

Journeys ending inside or outside of areas doesn't matter in the above approach.
All that matters is where they originate.

Building an area-to-area best-case journey time matrix is not what the above is about, but it's also a promising approach.
The resulting heuristics will be better than the one described above at cost of more computations and memory.
Memorywise on BB, the size of those matrices will be negligible compared to the path matrices, even storing eight of those matrices and including the better heuristics, thus more immediately rejected attempts, could outperform the above anyways.
I'm not sure which one is to prefer. Both should come with a decent performance boost.

jamespetts · November 22, 2020, 02:20:58 AM

Quote from: Freahk on November 21, 2020, 10:26:19 PM
Specifically, computing a best-case distance for a specific area means:
1. Determine all stops that could serve as the origin stop for a housing in the origin town. That means any stop within city borders+stop coverage. I'll call these "areas stops".
2. Check the journey times from those stops to all destinations and update current areas best-case data accordingly.

Thank you for this clarification: this is helpful. I think that this summarises where the approach that I had imagined differs from the approach that you describe.

If I understand correctly, your approach involves, for each town, taking each stop in that town and computing the minimum journey time for a set of journeys each with a minimum distance (although what the origin point of that distance would be it not entirely clear) to any ultimate end point.

The approach that I had imagined involves calculating using per stop data for journey times to the next transfer, and using an implicit maximum speed (i.e. journey distance divided by journey time) and extrapolating from that to reach a minimum possible journey time from a given stop to anywhere. When overseas journeys are introduced (i.e. to off-map destinations), the maximum speeds for these journeys would be stored separately.

Using speed in this way would prevent the computations from being useless in the way that they would be for global journey times, would be considerably simpler to implement and would reduce the computational overhead of the background calculations necessary for this.

It is not currently clear to me that such advantages as there might be from the system that you envisage would have sufficient advantage to outweigh those disadvantages.

jamespetts · November 22, 2020, 01:47:29 PM

I have now implemented what I described in an earlier post on this thread as the first layer of this suggestion, viz. not hitting the journey time database with a request if the walking time to the stop in question is greater than the journey time tolerance. I have also added a check to make sure that it is also not greater than a total journey time already found.

Performance testing with this shows a very significant improvement with this alone. Before the changes, running the Bridgewater-Brunel saved game from March 1944 shows path_explorer_t::compartment_t::get_path_between taking 20.54% of CPU time. After this change, it took 4.52% of CPU time in my test. step_passengers_and_mail_threaded took 55.14% before and 26.41% after, making it no longer the most CPU intensive part of the code (albeit only graphics parts of the code are higher).

Further testing with the simplest possible implementation of layer 2 (implicit speed checking), based on checking whether the implicit minimum average journey speed is less than 2,180km/h (the cruising speed of Concorde) shows path_explorer_t::compartment_t::get_path_between taking 4.08% of CPU time and step_passengers_and_mail_threaded taking 21.47% of time. The condition of a journey time exceeding this implicit speed was hit in many cases when testing on the Bridgewater-Brunel saved game with a debugger (being hit so frequently when running at normal speed that, when using a debugger, it would be immediately hit again each time after continuing), suggesting that this is making a real saving. It will be interesting to see how much that this saves when using more aggressive data generated from the game itself.

This suggests that the greatest savings can be made using simpler implementations and that diminishing returns are likely with more complex implementations.

However, the principle of this checking is a very good one and has made a great deal of difference to the performance. Thank you for suggesting this.

I have now pushed the very crude version of the second layer described above, and I should be interested to see whether this makes a difference to players' experiences of the game on Monday.

Mariculous · November 22, 2020, 08:36:00 PM

Quote from: jamespetts on November 22, 2020, 02:20:58 AM
If I understand correctly, your approach involves, for each town, taking each stop in that town and computing the minimum journey time for a set of journeys each with a minimum distance to any ultimate end point.

Yes.
There's a detail that still differs: I'd store the maximum possible distance to a set of predefined times, instead of the other way round.
This allows to make use of some iformation we have about the journey time distribution and saves a few lookups as the origin and the tolerable time remain constant.

Quote(although what the origin point of that distance would be it not entirely clear)

When calculating the best-case data, the distance in between stops is taken into account.
When validating a journey, the distance in between the housings is taken into account.
This makes sense, because the very best starting position we can get is directly on the stop, where the route of recorded best-case distance started.
This can only violate the heuristics property of not rejecting anything that can be reached, if the recorded best-case distance decreases when the journey time increases, which doesn't make sense.

Quote from: jamespetts on November 22, 2020, 02:20:58 AMUsing speed in this way would prevent the computations from being useless in the way that they would be for global journey times, would be considerably simpler to implement and would reduce the computational overhead of the background calculations necessary for this.

I do not agree with the last two points.
Currently this seems to be true, as you simply use the concorde speed as a placeholder, but if you want to use a per-stop-best-case-average-speed (and I highly expect this to be worthwile), the calculations to gather that information won't differ pretty much from the calculations to gather per-area-best-case-distances in both, computational load, as well as code complexity.

As mentioned, per-origin-stop data involves iterating all possible origin stops, which means it scales linearly with the number of stops per area.
I aimed at a heuristics that performs in constant time and makes use of simple calculations to rejects journeys very quickly.
The area-based heuristics itself will be much faster than the stop-based heuristics.
Anyways, it's hard to tell which of these two will result in better overall performance, as a slower heuristics rejecting more lookups can be much better than an extremely fast heuristics that rejects fewer lookups.

In any case,either of these heuristics should be much better than no heuristics at all, which is impressivley indicated by the numbers you provided.

Thank you for the first very basic implementation. The results seem quite promising on the BB server game.
I want to note that the high impact by only the walking time consideration is expected to be due to rather large stop coverage areas, thus long walking times to a fair amount of stops, as well as the tolerable journey time figure, which generates very many passengers with very low journey times.
It will be very interessting to see how performance improves even further when
a) using per-stop best-case speed extracted from the path matrix
b) using multiple per-stop best-case speeds, accoring to the distance, as a one hour journey, where half an hour is spent in a concorde and half an hour is spent in a bus, will still reult in a best-case-average-speed of roughly 1000 km/h stored in many bus stops, around that airport, whilst the best-case-average speed at local distances is much lower.

jamespetts · November 23, 2020, 02:21:37 PM

I have now implemented the next layer of this, just pushed to the master branch. In this layer, we take the maximum vehicle speed for all vehicles in the game (recalculated on loading a game and monthly), albeit with a minimum of one third of the fastest available vehicle up to a speed of 250km/h for ground vehicles and up to 895km/h for air vehicles, and using the maximum speed of available ground vehicles where there are no vehicles in the game, and use that to check the implicit speed required. Further, the maximum speed of aircraft are stored separately, and are excluded wherever the passengers' journey time tolerance is less than the minimum transfer time for airports, the ground vehicle timings being used instead.

The requisite implicit speed is calculated now twice: once overall to see whether even to consider looking for origin stops, and again individually for each origin stop, adjusted to take into account the walking time to the particular origin stop and the distance from that origin stop to the destination.

Testing shows a further considerable improvement. Using the Bridgewater-Brunel saved game from March 1944, karte_t::generate_passengers_or_mail() now records a total of 7.45% CPU time, and path_explorer_t::compartment_t::get_catg_path_between now takes only 0.93% of CPU time.

These savings are significant enough that the much greater effort of implementing any more sophisticated system than this is unlikely to be worthwhile owing to diminishing returns.

jamespetts · November 23, 2020, 02:37:49 PM

Briefly investigating memory consumption on the Bridgewater-Brunel saved game from March 1944, we find that it takes circa 9.6GB in RAM.

Of that, the greatest amount by far is allocated by planquadrat_rdwr() (i.e. the method for loading the individual game tiles), which allocates ~5.6GB. Next comes the path explorer, which allocates 1.065GB. Then comes karte_t::init_tiles, which allocates a further 880MB, and then haltestelle_t::create (stop creation code after loading), which allocates a further 553MB. Initialising the path explorer takes another ~162MB, and the convoys take a further ~101MB. Everything else is <100MB.

Thus, if we wish to improve memory consumption, we need to focus on ways of reducing per tile memory consumption.

jamespetts · November 23, 2020, 03:51:51 PM

Incidentally, in relation to memory consumption and load times, last year, with the previous Bridgewater-Brunel saved game, it was found that, at a certain point of network complexity, the time that it took the server to save the game became extreme and rendered the game unplayable. Investigation discovered that this was caused by the compression algorithm for some reason not working well with the path explorer data and processing it extremely slowly.

TurfIt investigated replacing this algorithm with a newer (open source) algorithm, and found this to be much faster. I looked into implementing this, but this was not followed through because (1) the server map density was going to be downsized significantly for the next (this current) game in any event, so it was not clear whether this would be necessary; and (2) this has now, I believe, been implemented into Standard, so it was considered better simply to wait for the feature to be ported from Standard when whoever was working on that project got that far.

Given that we are now having this problem again, it seems, I wonder whether there might be some merit in looking into expediting porting this specific feature from Standard (if I recalled correctly that it was integrated into Standard in the first place)?

freddyhayward · November 23, 2020, 10:29:26 PM

Although it isn't strictly a problem, public transport passenger numbers on bridgewater-brunel have globally declined ~10% since this change.

jamespetts · November 23, 2020, 10:40:25 PM

Quote from: freddyhayward on November 23, 2020, 10:29:26 PM
Although it isn't strictly a problem, public transport passenger numbers on bridgewater-brunel have globally declined ~10% since this change.

I discovered possible anomalies in the implementation in the current nightly build which I believe that I have rectified in the current version, which will be deployed in to-morrow's nightly build. These are a possible (but not certain) cause of what you report.

jamespetts · December 02, 2020, 07:33:53 PM

On the question of memory consumption, some profiling reveals that ground tiles, then the path explorer, then road tiles are the greatest users of memory, ground tiles taking ~2GB in the Bridgewater-Brunel game from the in-game year 1940 (and another 1.2GB being consumed by freelist_t::gimme_node() in the constructor for the boden_t object), the path explorer data taking ~1GB and road tiles taking ~858MB.

Other substantial users of memory include objects on tiles (~379MB), railways (~14MB), bridges (~132MB), halt information stored on ground tiles (~96MB), tunnels (~65MB), foundations (~55MB), halts (~551MB), convoys (~102MB), karte_t::init_tiles (~880MB), and factories (~54Mb).

jamespetts · December 13, 2020, 12:38:28 PM

I have adjusted some settings on the server's simuconf.tab, reducing the framerate to 10 from 15 and also reducing server_frames_between_steps. The latter should hopefully improve smoothness.

Given the recent improvements in performance resulting from the passenger route finding heuristics, I am hoping that the server will be able to keep up with this rate. I should be grateful if people could post their findings when these changes become effective from to-morrow morning.

Matthew · December 14, 2020, 08:05:49 PM

Quote from: jamespetts on December 13, 2020, 12:38:28 PM
I have adjusted some settings on the server's simuconf.tab, reducing the framerate to 10 from 15 and also reducing server_frames_between_steps. The latter should hopefully improve smoothness.

Given the recent improvements in performance resulting from the passenger route finding heuristics, I am hoping that the server will be able to keep up with this rate. I should be grateful if people could post their findings when these changes become effective from to-morrow morning.

My subjective first impression is that the online game is faster to join and has fewer freezes now than last week. It's good to make progress! But last week it sometimes took 45 minutes to begin play, so it's against a low bar. My first join took 'only' 10 minutes but that is obviously only one data point.

I notice that the server now saves promptly on the hour. That is a hopeful sign that the server is keeping up with real time.

jamespetts · December 14, 2020, 09:44:10 PM

Quote from: Matthew on December 14, 2020, 08:05:49 PM
My subjective first impression is that the online game is faster to join and has fewer freezes now than last week. It's good to make progress! But last week it sometimes took 45 minutes to begin play, so it's against a low bar. My first join took 'only' 10 minutes but that is obviously only one data point.

I notice that the server now saves promptly on the hour. That is a hopeful sign that the server is keeping up with real time.

Excellent, that is very helpful, thank you: it is good to know that here is at least some progress being made.

Matthew · December 21, 2020, 01:46:36 PM

After another week of play I can confirm that the Bridgewater-Brunel game is definitely faster to join than it was a couple of weeks ago.

There is also a definite pattern that the longer the server runs, the longer joining takes. I always have a countdown timer running anyway so I can measure it precisely. If I join around 0700 GMT, then "server preparing"+transfer+pause takes as little as 2 minutes 40 seconds. If I join about 1300 GMT, it takes ten minutes. I have not joined recently in the evening, but in the past that was the time when joining took longest. The occasional server restarts after crashes or manual interventions reduce the join times again, so it's not an external (Internet/data centre) problem; it's internal to B-B.

I don't know the reason, but I speculate that something is clogging up the server's RAM so it's having to use more swap space. Maybe a memory leak in the game or maybe some log file is being stored in RAM for some reason.

jamespetts · December 21, 2020, 02:28:13 PM

Thank you for this. It is odd that the server's run time should affect the joining time. The memory usage is not notably different, and, running an analysis using Visual Studio's performance analysis tools, I was unable to detect any memory leak. For example, yesterday evening at perhaps just before midnight, memory usage was 94.7%; now, it is 92.9% (but someone is in the process of joining; checking again after the person had finished joining, it was 95.3%). It seems very unlikely that such a relatively small difference in memory consumption (even if we extrapolate over a longer period) would be capable of causing a five-fold increase in server join times, and the memory consumption varies sufficiently over time when people join/leave to make it seem unlikely that there is any significant difference between memory consumption at different times of day in any event.

It is thus most odd that there should be this difference. May I ask whether anyone else was online at any of these joining times? If someone else is connected and that other person has a slow download speed from this server, that might well delay that person's joining and thus delay everyone; if that person happens to be online more at some times of day than others, this may be causative of the difference that you report.

Matthew · December 21, 2020, 04:49:38 PM

Quote from: jamespetts on December 21, 2020, 02:28:13 PM
Thank you for this. It is odd that the server's run time should affect the joining time. The memory usage is not notably different, and, running an analysis using Visual Studio's performance analysis tools, I was unable to detect any memory leak. For example, yesterday evening at perhaps just before midnight, memory usage was 94.7%; now, it is 92.9% (but someone is in the process of joining; checking again after the person had finished joining, it was 95.3%). It seems very unlikely that such a relatively small difference in memory consumption (even if we extrapolate over a longer period) would be capable of causing a five-fold increase in server join times, and the memory consumption varies sufficiently over time when people join/leave to make it seem unlikely that there is any significant difference between memory consumption at different times of day in any event.

Thank you for looking into this.

QuoteIt is thus most odd that there should be this difference. May I ask whether anyone else was online at any of these joining times? If someone else is connected and that other person has a slow download speed from this server, that might well delay that person's joining and thus delay everyone; if that person happens to be online more at some times of day than others, this may be causative of the difference that you report.

All the examples given were from me joining, so other people's download times should not be relevant (as they are loading the game that their client has saved locally) and my VM has a very fast download speed. The difference is always the "server preparing" stage. I will try to keep more detailed notes though.

prissi · December 22, 2020, 12:51:46 AM

The memory usage on a non-broken Linux will always hover at above 90% if you start swapping. I strongly suspect the memory size exceeds the main memory and the server starts swapping. Use htop or so and see how much swapping usage you have. That will dramatically influence performance.

freddyhayward · December 22, 2020, 02:11:59 AM

Quote from: prissi on December 22, 2020, 12:51:46 AM
The memory usage on a non-broken Linux will always hover at above 90% if you start swapping. I strongly suspect the memory size exceeds the main memory and the server starts swapping. Use htop or so and see how much swapping usage you have. That will dramatically influence performance.

At one point, the server was using ~2.5GB of swap space, meaning it is almost certainly the single greatest bottleneck in the server's performance.

Matthew · January 18, 2021, 05:13:57 PM

I collected some data on loading times over the past few days.

This afternoon it took 22 minutes from join to unpause at 1448 and then 25m at 1523. On neither occasion did the game consistently run faster than real time; there was twice a noticeable freeze+slowdown when the Path Explorer reached "Passengers (p_class_4) / explore", perhaps because the game is moving a large table/matrix out of swap.

These figures are actually worse than yesterday (#d25.....), when it took 8 minutes to join at 1536 and the game did reach real time (albeit after ten minutes).

This is really disappointing given that Freddy had got join times down to practically nothing just a few weeks ago and already all that progress has been lost.

The frustration is not only caused by the fact that you have to start connecting half-an-hour before you want to play (which is not practical for everybody), but also by the fact that the game seems more sensitive to dropped connections during the loading phase, and by the fact that if someone else tries to join while the game is paused or speeding up you are back to square one. You can easily spend half an hour waiting and still not have a usable connection.

The Linux graphical client's RAM usage hovered around 9GB of physical RAM plus 42MB of swap.

Appendix: Loading times
(Columns: Build — GMT time started — join requested — save prepared on server — save transferred — save loaded — game unpaused — game runs faster than real time)

jamespetts · January 18, 2021, 07:54:42 PM

Memory consumption is still the issue. The Linux server currently reports as follows:

Code Select


              total        used        free      shared  buff/cache   available
Mem:           7961        7655         143           1         162          92
Swap:         18047        4005       14042

When the game can be accommodated entirely in RAM, much better performance ensues. It is not immediately clear how long that one has to wait to see whether the latest changes have assisted to reduce memory consumption: two full refreshes of the private car routes are necessary before a proper comparison can be made.

Matthew · January 23, 2021, 05:59:52 AM

A couple of posts ago I was complaining about how slow multiplayer join times were. What a difference a few days (and a lot of effort) makes!

For the benefit of posterity, here is what good performance looks like. On recent builds, it has taken Bridgewater-Brunel about 40 seconds to prepare a very large map. Transferring the map took as little two minutes (and my Internet was slow because of poor weather). Three days ago the game was unpausing and began progressing faster than real time in no more than two minutes, and I think that was a transitional phase; yesterday it was as little as six seconds. The whole join process from requesting a join to running faster than real time can now be completed in about three minutes. If you are lucky enough to have an SSD, a client crash is not such a big deal anymore.

Thank you very much to Freddy for the improved code and to James for code review!

Appendix: Loading times
(Columns: Build — GMT time started — join requested — save prepared on server — save transferred — save loaded — game unpaused — game runs faster than real time)

News:

Performance analysis