News:

Simutrans Wiki Manual
The official on-line manual for Simutrans. Read and contribute.

Path explorer development

Started by jamespetts, March 22, 2019, 01:58:44 AM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

jamespetts

A number of aspects of the path explorer (the code that looks for routes through players' networks for passengers, mail and goods) are likely to need some work in the future, some in the nearer future than others.

I thought that it might be helpful to start a discussion of these issues as I know that A. Carlotti has expressed an interest in doing some work on the path explorer, and it would seem sensible to discuss the development before making any changes. This discussion is somewhat techincal in nature and principally aimed at those with some knowledge of the Simutrans-Extended codebase.

Saving the routing table

Currently, when loading very large saved games (such as that on the Bridgewater-Brunel server), calculating the paths takes a very long time - many times longer than all the other parts of the server game join cycle combined.

This is because the path explorer does not save the actual pathing data to file when it saves a game: instead, it reacalculates these data afresh. On a large saved game, this is a very large computational task. This is also single threaded, which makes it inefficient.

In theory, it would be possible to multi-thread this loading cycle. However, it seems to me that it is likely to be more efficient simply to save the data. Saving, transmitting and loading these data is likely to be much less time consuming than calculating them afresh, even if the calculation were made more efficient with multi-threading.

This may well increase the saved game size (by how much I know not), but it may make a very significant difference to the time taken to join a large network game, making the experience for players on these large network games smoother and more pleasant.
I suspect that it would be sensible to make this optional and allow the existing behaviour to be preserved in case of difficulties (e.g., if the file sizes prove too large with this method).

Storing class data in the connexions

Following investigations into this bug, it is apparent that not storing the class data in connexions causes difficulties. The interim fix for this bug does not fully address the issue, as routes for higher class passengers are still lost temporarily during a refresh. This has recently been reported as an issue on the Bridgewater-Brunel server at Ves's Halford airport. The problem is evidently intermittent as my 'buses do show very high wealth passengers heading to the airport by coach, and the aircraft do carry passengers.

Not storing the class data in the connexions was the chosen method originally to reduce memory consumption. Memory consumption on the server is currently a critical issue, with >95% memory usage at times, and server crashes and performance issues caused by loss of memory. I am considering upgrading to a plan with 8Gb rather than 6Gb of RAM available as a result of this, but want to confirm first that all of the crashes are memory related, as not all show up in the system log as an out of memory termination.

I do not know how much more memory that doing this would take: the difference may not be very great in comparison to the overall amount of memory taken; but I am currently not entirely sure how to test or predict this.

This is likely to involve a significant amount of work.

(Further) multi-threading

The path explorer is currently multi-threaded to a limited extent: when the game is running (but not when loading), a single path explorer thread can run concurrently with other threads. This makes the game more responsive than it would be if the path explorer ran on the main thread (I have tested this), but it does not make the path explorer refreshing any faster.

Since the 2017 addition of multiple classes, and doubly so if comfort based routing (see below) is to be introduced, the path explorer has had to calculate an increasing number of passenger routes (5 sets instead of 1, plus 2 sets of mail routes instead of 1; with the proposed comfort routing feature, this would further double the number of passenger routes that would need to be calculated). This means that the path explorer takes considerably longer to refresh then was hitherto the case, which in turn means that it takes longer for changes to player networks to affect how passengers and goods are actually routed in the game.

It may be worth looking into whether it is worthwhile allowing multiple instanes of the path explorer to run simultaneously in different threads (e.g., one thread for passenger class 0, one for passenger class 1, one for mail, etc.). This has the potential greatly to reduce the amount of time that it takes for the paths to refresh.
However, this would be a difficult exercise that would take a large amount of time to code and debug. It would also have very uncertain benefits. There are only so many threads available at any one time (4 is fairly common on modern systems), and the other threads are already being used for other things while the path explorer is running. I am currently sceptical that this would make much of a difference to path explorer performance in the real world until such time as CPUs with very high core counts (~16 or more) become common. My current inclination is not to pursue this further at this stage, but to keep the matter under review.

Using a more efficient hashtable

On the current Bridgewater-Brunel game, over 80% of the CPUs' time is spent finding routes for passengers, mail and goods between stops. Because these data are cached, most of that time is spent simply in hashtable lookups. Both A. Carlotti and I have examined the code for the Simutrans hashtable class and have found it to run in O(n) time (A. Carlotti's investigation is probably more rigorous than mine). In principle, a hashtable should run in O(1) (although this is not achieved if there are any clashing hashes). Thus, using a better hashtable would be much more efficient. Using a hashtable from the std library, for example, could significantly boost performance.
However, I wonder whether a hashtable is even the best data type for this purpose. The hashtable key is a pair of 16 integers, each representing the ID of a stop in the game. I wonder whether lookups would be even faster if some sort of multi-dimensional vector were used instead, although I admit that this is somewhat outside my field of expertise, so any informed comment on this would be welcomed.

Simply switching to the std hashtable might be relatively straightforward, although there might be unforeseen complexities. Using a different data structure entirely is likely to take considerably more work, so it is best to know that there will be an commensurate benefit if such additional work is to be undertaken. If anyone has any thoughts on this issue, that would be very useful.

Note: On closer inspection, it seems that the path data are not stored in a hashtable after all: see this post for details.

(Partial) comfort based routing

A longer term project would be to include comfort in routing calculations. This is a very complex issue, and the only sensible way that I have come up with doing it is to use a very great simplification of what would happen in reality, but which simplification I hope is elegant enough to be workable to produce something that is approximately realistic (more so than the current system) and internally consistent without giving rise to anomalies.

The principal difficulty that a comfort based routing system would have to overcome is that, in a cached routing system, there is only one route from each origin to each destination. The secondary difficulty that would have to be overcome is that, if one is combining two numbers (speed and comfort) to produce a routing calculation, this increases computational load by requiring a divide operation, which may be significant when dealing with very large datasets (there are >5,000 stops in the Bridgewater-Brunel saved game, giving rise to over 25 million pairs of routes between them).

The solution to the first has the potential to make the second more of an issue. The solution to the first of the issues is to have two types of passenger: one who always prefers the fastest route (as at present) and one who prefers a route which has a balance of comfort and speed. Which type of passenger that each passenger generated is would be determined using a weighted random selection at the point of generation. This would be independent of wealth level.

There would be two routing tables for each class of passenger: one for speed alone and one for a balance between speed and comfort. The latter would be calculated using the same basic path explorer algorithm as at present and stored in the same data structure, save that the edge weights of each connexion would be calculated using an algorithm producing a factor of both comfort and journey time. Some parameters of that algorithm would be set in simuconf.tab. This factor would be calculated at the path explorer phase of compiling connexions, so would need to be done only thousands, rather than millions, of times for a single refresh.

Comfort would be calculated both for nodes and edges: walking would have a comfort level, interchanges would have a comfort penalty, and stops would also have a comfort rating: the longer that passengers have to wait at a stop, the more comfortable that it would have to be (using the same algorithm for the relationship between the two as for vehicles). The greater the disproportion of the comfort to the waiting time, the greater comfort cost that each interchange would have.

The speed passengers would route in exactly the same way as they do now. The balanced passengers would first check the balanced routing table for their class to see whether any comfortable routes were available to each alternative destination within their journey time tolerance, and then, if a route is available that is too slow, check the speed routes to see whether that route is within the passenger's journey time tolerance, and, if it is, convert to a speed type passenger.

Two complexities that remain to be considered are: (1) how a journey time can be given where the edge cost is diluted with comfort data; and (2) whether balanced passengers (or indeed all passengers) should also have a comfort budget/maximum discomfort tolerance for a journey and how this would work. One possible solution to both of these issues is to store (1) journey time; (2) comfort; and (3) the factor of the two separately for each connexion, and store the combined value of all three for each route even if only one of these figures is used for the routing calculations.

Incidentally, what would be stored as far as comfort is concerned is not the base comfort value, but rather the extent to which the edge/node is greater or less than the minimum level of comfort for the amount of time that the passenger spends there. Doing it in this way could mean that the comfort is actually stored in values of time (e.g., edge A has a comfort of +10 minutes, meaning that the journey is 10 minutes faster than the maximum comfortable journey time for the comfort of the vehicle, or node B has a comfort of - 60 minutes, meaning that the stop has a waiting time of 60 minutes longer than the maximum comfortable waiting time at that stop), which might make it possible to combine these with the speed data without any division at all, although I am not sure whether this would then give too much prominence to comfort in the data.

Conclusion
Some of these projects are more long-term than others, and I am not sure whether all of them are feasible, especially increasing multi-threading and the comfort based routing, the main problem with the latter of which is likely to be the increase in computational load. The comfort based routing proposal has been included in here in part so that discussion of the other proposed features can progress bearing in mind the desire for this feature in the future.

I should in any event be grateful for feedback on the above plans so that consideration can be given to how best to implement and prioritise the above tasks; the earlier on the list are potentially of a fairly high priority given the current difficulties on the Bridgewater-Brunel server.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

prissi

about hashtables:

If the time is really o(n), you can have an array of sorted arrays.

The first array carries the index of the first stop, and the second array is sorted by the index of the the second stop. Then retrieve the right sorted array and sort for second stop with a binary search. The means 1+10 access even if the target stations has 1000 connections. O(1+log(n))

But I think o(n) just means the bagging factor (79?) is way too small. Since the hashtables are using a prime for bagging, maybe just using a larger prime to shorten the respective lists would help. Or combine bagging with sorted arrays as indicated above.

jamespetts

Prissi - thank you for that useful advice.

Looking at this more closely, I do not believe that the pathing data are actually stored in a hashtable. Running a profiling analysis on the current Bridgewater-Brunel game, I get the following path explorer related results:

karte_t::generate_passengers_or_mail - 81.14% total CPU; 3.34% self CPU
-> haltestelle_t::find_route - 49.78% total CPU; 5.02% self CPU
--> path_explorer_t::get_catg_path_between - 18.84% total CPU; 3.75% self CPU
---> path_explorer_t::compartment_t::get_path_between - 13.18% total CPU; 4.88% self CPU
----> quickstone_tpl<haltestelle_t>::is_bound - 7.79% total CPU; 7.78% self CPU
----> quickstone_tpl<haltestelle_t>::get_id - 0.49% total CPU; 0.49% self CPU
---> quickstone_t<haltestelle_t>::quickstone_tpl<haltestelle_t> - 1.90% total CPU; 1.90% self CPU
--> shortest_distance - 6.53% total CPU; 3.67% self CPU
..
-> haltestelle_t::find_destination - 12.30% total CPU; 0.54% self CPU
--> pick_any_weighted<gebeaude_t* __ptr64,weighted_vector_tpl> - 9.48% total CPU, 0.06% self CPU
---> weighted_vector_tpl<gebeaude_t* __ptr64>::at_weight - 8.82% total CPU; 1.82% self CPU
----> weighted_vector_tpl<gebeaude_t* __ptr64>::weight_at - 7.00% total CPU; 7.00% self CPU
--> obj_t::get_pos - 1.58% total CPU; 1.58% self CPU
...

The greatest CPU consumption appears to come from, in order:

(1) quickstone_tpl<haltestelle_t>::is_bound
(2) weighted_vector_tpl<gebeaude_t* __ptr64>::weight_at
(3) haltestelle_t::find_route
(4) path_explorer_t::compartment_t::get_path_between
(5) path_explorer_t::compartment_t::get_path_between

The get_catg_path_between code is as follows:


static bool get_catg_path_between(const uint8 category, const halthandle_t origin_halt, const halthandle_t target_halt,
  uint32 &aggregate_time, halthandle_t &next_transfer, uint8 g_class = 0)
{
return goods_compartment[category][g_class].get_path_between(origin_halt, target_halt, aggregate_time, next_transfer);
}


The get_path_between code is as follows:


bool path_explorer_t::compartment_t::get_path_between(const halthandle_t origin_halt, const halthandle_t target_halt,
  uint32 &aggregate_time, halthandle_t &next_transfer)
{
uint16 origin_index, target_index;

// check if origin and target halts are both present in matrix; if yes, check the validity of the next transfer
if ( paths_available && origin_halt.is_bound() && target_halt.is_bound()
&& ( origin_index = finished_halt_index_map[ origin_halt.get_id() ] ) != 65535
&& ( target_index = finished_halt_index_map[ target_halt.get_id() ] ) != 65535
&& finished_matrix[origin_index][target_index].next_transfer.is_bound() )
{
aggregate_time = finished_matrix[origin_index][target_index].aggregate_time;
next_transfer = finished_matrix[origin_index][target_index].next_transfer;
return true;
}

// requested path not found
aggregate_time = UINT32_MAX_VALUE;
next_transfer = halthandle_t();
return false;
}


The goods compartment is actually a basic array without even any sort of collection class wrapper:


public:
static compartment_t **goods_compartment;


Likewise finished_matrix:


path_element_t **finished_matrix;


It is notable that the bounds checking takes so much time. Both the origin and desintation halts are bounds checked in get_catg_path_between, yet the only two places in the code that call get_catg_path_between always call it with "self" as the origin, so the bounds check on the origin is superfluous.

One of the two places that calls it (and the place that calls it by far the most often) has the following code:


if (self == *destination_halt)
{
continue;
}

path_explorer_t::get_catg_path_between(ware_catg, self, *destination_halt, test_time, test_transfer, ware.g_class);

if(!destination_halt->is_bound())
{
// This halt has been deleted recently.  Don't go there.
continue;
}


This appears to bounds check the destination halt after invoking the path retrieval, when this bounds check is already done in the path explorer. The path retrieval is otiose if the bounds check fails here, so this is wasteful. Altering the code thus will remove this redundancy:


if (self == *destination_halt)
{
continue;
}

if(!destination_halt->is_bound())
{
// This halt has been deleted recently.  Don't go there.
continue;
}

path_explorer_t::get_catg_path_between(ware_catg, self, *destination_halt, test_time, test_transfer, ware.g_class);


The only other place that the path retrieval is called is in haltestelle_t::fetch_goods:


// Check to see whether this is the convoy departing from this stop that will arrive at the next transfer or ultimate destination the soonest.
convoihandle_t fast_convoy;
sint64 best_arrival_time;
if (bound_for_next_transfer)
{
best_arrival_time = calc_earliest_arrival_time_at(next_transfer, fast_convoy, catg_index, next_to_load->g_class);
}
else
{
// This should be called only relatively rarely, when a convoy bound for this packet's ultimate destination arrives,
// but this convoy is routed via an intermediate halt. Check whether it is sensible to stick to the planned route
// here.
uint32 test_time = 0;
halthandle_t test_transfer;
path_explorer_t::get_catg_path_between(catg_index, self, destination, test_time, test_transfer, next_to_load->g_class);
const sint64 test_time_in_ticks = welt->get_seconds_to_ticks(test_time * 6);
best_arrival_time = test_time_in_ticks + welt->get_ticks();
}


This is much more rarely called than the above instance, and is only called contingently. Given that no memory associated with "destination" is actually accessed here, nor in get_path_between (just the ID), it is probably safe to omit the bounds check on the destination halt, too. Thus, we can amend get_path_between as follows:


bool path_explorer_t::compartment_t::get_path_between(const halthandle_t origin_halt, const halthandle_t target_halt,
  uint32 &aggregate_time, halthandle_t &next_transfer)
{
uint16 origin_index, target_index;

// check if origin and target halts are both present in matrix; if yes, check the validity of the next transfer
if ( paths_available /*&& origin_halt.is_bound() && target_halt.is_bound()*/
&& ( origin_index = finished_halt_index_map[ origin_halt.get_id() ] ) != 65535
&& ( target_index = finished_halt_index_map[ target_halt.get_id() ] ) != 65535
&& finished_matrix[origin_index][target_index].next_transfer.is_bound() )
{
aggregate_time = finished_matrix[origin_index][target_index].aggregate_time;
next_transfer = finished_matrix[origin_index][target_index].next_transfer;
return true;
}

// requested path not found
aggregate_time = UINT32_MAX_VALUE;
next_transfer = halthandle_t();
return false;
}


Further, as get_id() seems to take so much processing time, I have added an explicit "inline" to this, as the code is nothing other than:


/**
* @return the index into the tombstone table. May be used as
* an ID for the referenced object.
*/
uint16 get_id() const { return entry; }


Whether this will make a difference remains to be seen, but it certainly cannot hurt.

One of the other parts of the code that is very computationally intensive is weighted_vector_tpl<gebeaude_t* __ptr64>::weight_at. This is fairly low level code that I did not write and with which I am familiar. It is as follows:


/** returns the weight at a position */
uint32 weight_at(uint32 pos) const
{
return (pos < count) ? nodes[pos].weight : total_weight + 1;
}


It looks rather innocuous, so it is curious that it should be capable of consuming, by itself, 7% of all CPU time in the Bridgewater-Brunel current map. If anyone has any ideas as to how to optimise the weighted vector (which I notice is not something that exists in any of the standard C++ libraries, at least so far as I can find), that would be most welcome.

In any event, I am about to commit the optimisations. I should be grateful for any views on these matters or on any of the other matters discussed in the original post.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

TurfIt

Quote from: jamespetts on March 28, 2019, 12:48:11 AM
Looking at this more closely, I do not believe that the pathing data are actually stored in a hashtable.  Running a profiling analysis on the current Bridgewater-Brunel game, I get the following path explorer related results:
Nope, but the creation of the pathing data does use hashtables...  might want to profile that instead since the complaint in the original post was the initial path calc after loading takes too long.

You need to run dump_stats() on each hashtable to see the distribution of items. If the items are all in a few buckets, you need a better hash function. If the buckets all have more than a few items each, you need more buckets. Current simutrans hashtable is fixed at 101 buckets, would need a more flexible implementation so each hashtable can be properly customized to the data it contains.

Ideally each item hashes into it's own bucket; That gives you the O(1). If all items end up in the same bucket, you're linearly searching a linked list. performance --.



Quote from: jamespetts on March 28, 2019, 12:48:11 AM
One of the other parts of the code that is very computationally intensive is weighted_vector_tpl<gebeaude_t* __ptr64>::weight_at. This is fairly low level code that I did not write and with which I am familiar. It is as follows:

It looks rather innocuous, so it is curious that it should be capable of consuming, by itself, 7% of all CPU time in the Bridgewater-Brunel current map. If anyone has any ideas as to how to optimise the weighted vector (which I notice is not something that exists in any of the standard C++ libraries, at least so far as I can find), that would be most welcome.
The only optimization here is upstream - not calling pick_any_weighted so often. Perhaps a weighted vector is not the correct data structure for the data and it's use. Or perhaps it is, it must be called so often, and hence the game has simply reached its maximum size/complexity for current computers.

prissi

#4
Turfit said most of it.

is_bound() is called everywhere a zillion times, not only in the path explorer, and often in conditions. But any not braindead compiler will inline it, since it is just "halt.data[entry] != 0", i.e. one indirection and one test.

And the weighted vector is strongly optimised. One can only improve here by simplify citygrowth.

ACarlotti

I made a note to myself a while back that suggests that I was aware of inefficiencies relating to weighted vectors. Unfortunately I can't work out right now exactly what I was looking at, but I think it might have related to the running time of element update or deletion being O(n).

prissi

That again would be only at constructing time. Also in practice (assuming a vector of pointers) the inserting could be even faster than for lists, despite copying more data (but linear and in one go).

If you use these arrays of pointer to weighted vectors a lot, I would suggest some freelist like custom allocator for those, since allocation can take quite some time if there are really many to handle (and all with identical, predetermined sizes). It even saves some memory.

If this is an array of weighted vectors, I would strongly suggest to change it in an array of pointers to weighted vectors. Much more cache.friendly and faster to insert/delete.

jamespetts

Thank you both for your replies, that is helpful. I will reply under subheadings for clarity.

Optimising path retrieval

is_bound() is commonly called indeed, but it was definitely taking 7% of all CPU time in the Bridgewater-Brunel save just for when it was called with the path explorer's route retrieval. I suspect that this was not so much computational load as memory bandwidth, as the relevant data would have to be fetched from the heap a very great number of times. Also, I do not know how useful that it is profiling with a non-optimised debug build: it may be that this is not inlining where an optimised build would (but even if manual inlining helps only non-optimised debug builds, this is useful in that it will make debugging slightly more responsive).

Optimising the initial calculation

I have just re-run the profiling (on a smaller map than the Bridgewater-Brunel server to get a sane loading time, but a substantial map all the same). The initial, single-threaded refresh yields the following profiling results:

-> path_explorer_t::compartment_step - 81.75% total CPU; 37.40% self CPU
--> vector_tpl<unsigned_short>::operator[] - 25.75% total CPU; 25.71% self CPU
--> haltestelle_t::get_average_waiting_time - 10.15% total CPU; 0.10% self CPU
---> haltestelle_t::get_service_frequency - 9.50% total CPU; 0.04% self CPU
----> haltestelle_t::calc_service_frequency - 9.14% total CPU; 0.35% self CPU
-----> haltestelle_t::get_halt - 4.99% total CPU; 0.78% self CPU
-----> hashtable_tpl<id_pair, average_tpl<unsigned int,koordhash_tpl<id_pair> >::get 2.32% total CPU; 0.77% self CPU
...
---> hashtable_tpl<haltestelle_t::service_frequency_specifier,unsigned int,koordhash_tpl <haltestelle_t::service_freqency_specifier> >::set - 0.22% total CPU; 0.08% self CPU
---> hashtable_tpl<unsigned int,haltestelle_t::waiting_time_set,inthash_tpl<unsigned int> >::is_contained - total CPU 0.12%; self CPU 0.04%
...
--> path_explorer_t::compartment_t::connection_t::register_connection - 2.86% total CPU; 0.34% self CPU
---> hashtable_tpl<unsigned short,path_explorer_t::compartment_t::connection_t::connection_cluster_t* __ptr64,inthash_tpl<unsigned short> >::get 2.26% total CPU; 0.80% self CPU
... (going further down leads to more hashtable calls)
...
--> vector_tpl<unsigned short>::get_count - 2.72% total CPU; 2.71% self CPU
...

This again seems to me to bear the hallmarks of being memory bandwidth limited, especially vector_tpl::operator[] taking ~25% of all CPU time here. The hashtable operations, by contrast, take <3 % CPU time individually, which does not seem to suggest any serious problem in the hashtable algorithm or implementation.

This does not seem to be readily optimisable unless there are serious inefficiencies with the vector or hashtable. It seems to me that saving the data is likely to be a superior option to recalculating them on loading given these performance issues, although this will need testing.

Alternatives to optimising the path retrieval - map/population size and alternative destinations

Returning to the topic of the path explorer's performance when running, it may well be that this cannot be optimised much more. Thought will have to be given perhaps to how to reduce how much that it is called. In the Bridgewater-Brunel server game, there are 760 towns and a population of circa 6 million (at the current date of circa 1947). The combination of the large population and the high number of alternative destinations would be the reason for the path retrieval system being called with a high frequency. There are thus two options for reducing this: either play with smaller maps (or at least, less dense maps - the size itself does not much matter) or reduce the number of alternative destinations. The latter, if it can be done without unbalancing the game, is a better option than having to reduce the number of towns or the population size, as it requires no reduction in the intricacy and scope of the game.

The reason for having a large number of alternative destinations is to simulate the fact that passengers tend to travel to destinations that they can reach within a reasonable time in preference to those that they cannot reach within a reasonable time. Most people do not pick a shop anywhere in the country at random to do their grocery shopping and just not go shopping if they cannot get there and back in an hour. The larger the map, the more alternative destinations are needed, as the greater the chance of any given destination not being reachable within a reasonable time (the journey time tolerance of the passengers). The number of alternative destinations automatically scales with the population size because of this. In the current Bridgewater-Brunel server game, as of the game date of 1945, commuting passengers each had a minimum of 9,490 and a maximum of 9,491 alternative destinations, and visiting passengers between 4,523 and 12,063. There are considerably more visiting than commuting passengers generated.

I have for some time wondered whether some more sophisticated system relating to alternative destinations may be desirable. In particular, there are some destinations to which passengers will more readily select an alternative than others. A supermarket, for example, may be chosen because it is local; but a well known museum may be chosen because of its national reputation.

The system that I have long contemplated to address this whilst not adding much additional complexity or computational load is one in which each building is assigned a uniqueness value by which the number of alternative destinations is multiplied. For example, suppose that a passenger with 100 possible destinations alights first on a building with a 50% value: the number of destinations would be reduced to 50. If the next building also had a 50% value, the number of destinations would further be reduced to 25, and so forth. Whether it is better to do this for total destinations or remaining destinations is not immediately clear to me, although I tend to the latter to give buildings selected earlier a higher prominence. The effect of this would be to make buildings with a high uniqueness rating (or a low alternative destination factor, however this ends up being expressed for computational efficiency to minimise division operations in frequently called code) buildings to which passengers have fewer alternatives and to which passengers are therefore more likely to refuse to accept an alternative destination. This would not only potentially reduce the number of alternative destinations overall, but would do so in a way that gives certain sorts of buildings a greater network significance than other buildings and which will tend to attract passengers from farther away once connected well.

One might calibrate this so that the base number of alternative destinations is halved, and certain buildings (such as basic shops) would re-double it to the present levels, and certain high uniqueness buildings would reduce it even further. Quite how this would work on a large map remains to be seen. I should be grateful for any input (or mathematical modelling, even approximate or heuristic) on this proposed algorithm.

Further thoughts on comfort based routing

As noted, thought will have to be given to whether computing an additional five sets of passenger routes can be done within a reasonable time. Since it is not clear that the path explorer can be optimised further, this may be open to some doubt. Investigations will have to be carried out as to whether there is enough slack time in the existing concurrent but not parallel path explorer multi-threading to allow the path explorer to do more work each step (the amount of work per step is currently set manually in the code). Some thought may need to be given to making the amount of work per step for the path explorer a user configurable value.

Assuming that this can be done within the limits of computational performance and a reasonable refresh time, I have been considering in a little more detail how this might work. Even if it is a while before this ends up being implemented, I think it prudent to record these thoughts here for future reference.

As a reminder, data about the comfort of a journey is to be stored in minutes: that is, the number of minutes longer or shorter than the maximum comfortable journey in the mode of transport in question that the journey is.

Firstly, probably the best way of storing the comfort data in the connexions is to store a single (signed) value of comfort_adjustment_minutes alongside the base journey time for each connexion. The same can apply to each node (halt). The comfort_adjustment_minutes can then simply be added to the base journey time to get the comfort based edge weightings; the base journey time can be used alone to get the time only edge weightings, and the comfort_adjustment_minutes could be used alone to be used with a journey discomfort tolerance feature.

Secondly, all passengers would have, as now, a journey time tolerance, but also a journey discomfort tolerance. The latter would vary per packet of passengers, just as journey time tolerance does now. Where a packet of passengers assigned to route by comfort finds a comfort optimised route to the chosen destination that exceeds that passenger's journey time tolerance, the passenger will check whether the speed optimised route is within the journey time tolerance. However, in either case, the passenger will not travel if the journey falls below the passenger's journey discomfort tolerance (expressed in minutes as explained above).

Thirdly, although additional comfort would allow a negative value for comfort_adjustment_minutes, for each leg of the journey (edge), this value would never be more than minus half the total time for the leg of the journey. Conversely, there would be no cap on discomfort. (I am a little unsure of this. In the late 19th and early 20th century, it was popular among daytrippers to take the boat from London to Southend even though this took about 3-4 times longer than the train, as the boat was seen as part of the day out; this might not be able to be simulated by capping the comfort benefit; I am not sure whether anomalous results would occur if the comfort benefit were not capped, or whether anomalous results would be avoided by enforcing the journey time tolerance in real minutes as discussed above).

Fourthly, walking would have a comfort value assigned to it. This would be a fairly low comfort value, and would replace the current restrictions on walking time. Likewise, each interchange should have a comfort penalty to simulate the inconvenience of having to leave one conveyance and board another. This would help to simulate the reluctance of some passengers to change a great many times.

Fifthly, halts (nodes) would have a comfort value. This would affect the adjusted waiting time for comfort optimised routing in just the same way as the comfort of a vehicle would affect the adjusted travelling time in comfort optimised routing. Overcrowding would diminish this comfort in the same way as for vehicles. This, combined with increasing the transfer time of an overcrowded stop, would allow stops to become much more overcrowded than at present, perhaps allowing 3-4 times their true capacity before actually turning passengers away. This should perhaps be adjustable in the stops' .dat files just as overcrowding values are for vehicles.

Sixthly, the chance of passengers preferring comfort optimised routing over speed optimised routing should vary between visiting and commuting passengers. This should be able to be set in the pakset, but, generally, visiting passengers should have a higher likelihood of optimising for comfort than commuting passengers.

Seventhly, it may well be possible when and if this mechanism should be introduced to do away entirely with revenue adjustments based on comfort, in the same way as speed based routing has allowed revenue adjustments based on speed to be abolished entirely.

I suspect that this system has the potential to work quite well without much additional complexity (and to be more intuitive than the way in which comfort currently works), as well as having the side effect of dissipating passengers and thus preventing passengers from all taking the same route which can in some cases produce anomalous results.

I should be interested in feedback on this proposed algorithm.

A question for A. Carlotti

I understand from another post that you were interested in doing some work on the path explorer sometime soon. May I ask what sort of thing that you were thinking of doing so that I can co-ordinate what I plan to do with what you plan to do so as not to overlap or clash?
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Vladki

#8
I like the idea of comfort value for stops.

Uniqueness of a given destination could be calculated using the number of real occurrence of the building in the game.

Some thoughts about pathfinder :

How about splitting the destinations more. First split visitors to shopping (com, ind) and leisure (res, cur). Then trips that are done every day, commuting and shopping could have lower time tolerance, then leisure trips which usually happen on weekends and holidays. (i. e. less frequently)

Would it help if the pathfinder could be limited to a smaller part of the map for daily trips?

I know that some people commute weekly on much longer distance, but that could be counted as visit as they usually have some temporary accommodation closer to the workplace for daily commuting.

And visiting town hall should count as shopping ?

jamespetts

The actual number of a particular type of building on a map does not helpfully tell us how far that people will travel to visit the building: people will not be determined to travel to the first supermarket to open in the 1960s, for example, from the other side of the map just because it is currently the only supermarket available. The uniqueness rating should measure an intrinsic property of the buildings rather than merely frequency which does not by itself show very much about how satisfied that passengers would be with an alternative to it.

Splitting the destinations further may be possible, but it would add significant work and overhead and should only be done it if is very clearly preferable to the current system and does not cause any anomalies. For example, splitting trips into "shopping" and "leisure" does not account for business travel (including onward trips from commuting trips), nor does it deal effectively with the fact that leisure travel was not a thing until relatively recently (unless one would then implement another algorithm to deal specifically with this, which would then be another whole thing to balance, viz. the proportion of leisure trips to other trips changing over time, which presumably would have to be different by class, and which would be difficult to calibrate effectively in conjunction with the per class visitor demand of buildings). There is then the difficulty of having to assign every single building in each pakset per class visitor demand values for each separate type of visiting trip. Unless a very elegant and easily balancable (i.e. highly realistic) algorithm for further separation can be thought of, further separation is likely to be more trouble than it is worth. In any event, it is not clear how this would actually make a significant difference to the number of alternative destinations needed.

As to limiting the pathfinder to an area of the map, this is not feasible. The buildings are all stored in one list. It would actually take more computational effort to filter that list. It would also give rise to serious anomalies in the game. Indeed, this is how the system used to work and it worked very badly, which is why the current system was introduced to replace it. The principle behind the current system is that journeys should always be measured in time, not distance, as this is what people do in real life.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Vladki

Quote from: jamespetts on March 29, 2019, 11:52:17 AMpeople will not be determined to travel to the first supermarket to open in the 1960s, for example, from the other side of the map just because it is currently the only supermarket available.
Sooo wrong.... I was 12 when the iron curtain fell down, and shortly after that people really travelled from czechoslovakia to austria just to do shopping, or just to have a look at shops where you could buy virtually everything (if you had enough money, which we did not).   In real world, there is just one Stonehenge, and maybe a few similar things, just not so famous. But in simutrans world, several stonehenges can spawn. Simupeople should be just fine visiting any of them, if many have spawned, or travel around the world to see the only one existing.

The split I had in mind was more technically like this:
- commuting - (everything except res), lower time tolerance, higher frequency (daily travel), count as employees/jobs on destination (same as now)
- shopping - destination (com, ind, tow, factory; maybe also hq, signalbox, depot), lower time tolerance, higher frequency (daily travel), count as visitors
- leisure - destination (res, cur, mon), higher time tolerance, lower frequency (weekly or less), count as visitors
- i think there is not much space for very long distance commuting

Business trips, would fall under commuting if short, or leisure if longer. With long (more days) trip, you usually do not go directly to the company, but stay in hotel which could be counted as residential building, or visits to the big factories (with contracts) and hq, could be added to "leisure trips" to simulate business trips. Anyway a travelling businessman will count as visitor, not employee
Leisure travel was not common in early eras because not so many people could afford it, either due to lack of money or time. That is already simulated (time tolerance, classes). But e.g. pilgrimage could be considered as leisure travel for the needs of simulation.

But anyway, if this would not help to reduce the scope of pathfinding, then it would not be needed.

jamespetts

I have started work on this in the path-explorer-modifications branch on Github. I have so far implemented two changes:

(1) it is possible to customise how much work that the path explorer does every step in simuconf.tab and the advanced settings dialogue; and
(2) connexions now store class data.

I have carried out brief testing of both of these features but have not had time to test them in depth. Also, I have not written a full UI for the latter feature yet showing what classes are attached to what connexions.

If anyone who is handy with compiling code would like to help to test this, that would be very helpful. Likewise, if anyone would like to write the UI part of the second feature (showing which classes are served by each connexion in the halt detail window and possibly some way of showing/filtering these on the map), that would be very helpful.

My next task is to write code to load/save the path explorer data which, if successful, will greatly speed the loading of large saved games and therefore the joining of online games.

I should be grateful for any testing/comments/review of this new code if anyone has time to look over it to make sure that I have not made any obvious mistakes before I integrate this with the production version. 
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

jamespetts

I have tested this somewhat further, and have confirmed that this fully fixes the issue relating to temporary loss of connexions for higher class passengers (most commonly seen with aircraft), so I have removed the interim fix for this in the code.

As to the passenger generation aspects: the phenomenon of people travelling from miles around to see the first supermarket in the area after the fall of eastern European communism is not typical of what happens when the first of any given kind of building is generated.

Splitting visiting passengers into leisure/business and shopping would take a considerable amount of coding work, but it is not clear what functionality that this would actually add in real terms.

What is needed is simply a way for certain buildings to have a higher importance than others, i.e., passengers bound for those buildings to be less likely to be satisfied with any alternative destination than a more generic building. I do not think that any additional functionality is likely to be desirable, as it would be likely to break the already delicate balance between:

(1) generating the correct number of passengers overall, both on a well connected and poorly connected map;
(2) having enough passengers visiting consumer industries;
(3) allowing enough passenger trips to fail when any given location has less than perfect transport to create sufficient signal for the proposed new town growth system to be able to detect reliably which parts of the map are best connected by transport;
(4) not creating anomalous or inconsistent behaviour in any case that has a real possibility of ever arising in a real game;
(5) giving some buildings more importance than others distinct from their visitor demand; and
(6) not overloading the computer at map sizes that we want to be able to play (albeit this will slowly become easier as computers become faster and have more CPU cores).

I have deduced that the system that I had described above is not the most efficient way of achieving this. Instead, a simpler system is likely to suffice, in which each building has a number representing the number of chances in 1,000 that a packet of passengers will accept no alternative destinations once it has selected that building as its destination. By default for any given building, this would be zero, which would be identical to the current behaviour. Setting this to a non-zero value would give passengers a chance of halting their alternative destination search at that building. Setting the value to 1,000 would mean that passengers would always refuse to consider any alternatives once they had selected this building (albeit I would not recommend setting this to this value). The idea would be to set non-zero values only for buildings of the sort that, in reality, not every town would have, buildings such as sports stadia, museums, major visitor attractions (e.g. the "greenhouse" in Pak128.Britain-Ex) and high level commercial buildings (including player headquarters).

This would have the effect that towns connected well to towns containing buildings with a non-zero uniqueness rating would have better passenger success statistics than towns not so connected. This is likely to be very helpful in giving a meaningful signal to the proposed town growth mechanism to show which areas of the map are more well connected than which other areas.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Ranran

I agree with the idea of Vladki.
Because I think their movement is done with different algorithms.

I think it's weird that the current situation where "no route" are unlikely to occur due to the impact of alternative destinations. They are too low in desire.
By separating the movement purpose, the problem will become clear. Currently, the problem is "alternative destination" will easily eliminate the problem of passenger flow. (There is no route, most are too slow)

For example, if there is no transportation for people who want to go sightseeing from Japan to London, is it "no problem" to give it up and go to near park?
In that case, I think that an error that the network is incomplete should be displayed.
(Of course some people will have an alternative plan to go to Paris or Las Vegas.)
If businessmen go on a business trip to a branch in London if they do not have access, will they go to a nearby branch?
They should complain to lazy transport companies.  ::(

But the movement linked to life should not be. They will desperately look for alternative routes. And it is desirable to be as close as possible.

I think it can reduce the extra alternative destination search.

Of course it would be nice if there was information on what trip was no route.

QuoteWhat is needed is simply a way for certain buildings to have a higher importance than others
I think that segmentation of moving objects will clarify the characteristics of the building.

- Attraction buildings attract long leisure trips
- Consumer industries and COM buildings attract shopping trips
- COM buildings attract short leisure trips
- RES buildings attract visiting trips

I think business trips and visits (eg friends and family) can be together - "visiting".
Tourist facilities that are not connected to the distance will not attract many people. Because the locals are getting bored with it. (´・ω・`)
So please bring a rich man from afar.
Long distance travel may make a short trip at the destination.
These mobility success rates will promote the development of buildings of that type around.

jamespetts

It is important to remember that, by design, passengers do not care about distance per se: only journey time. With that in mind, and without adding too high a degree of specificity that will fail or create unnecessary complexity in dealing with non-paradigm cases (e.g. is a visit to a solicitor's or insurance broker's office to deal with personal matter a shopping trip or a business trip?), a possible algorithm based on your suggestion would be subdividing visiting passengers into two: (1) routine trips; and (2) special trips. Routine trips would have a high number of alternative destinations and a low journey time tolerance, and special trips would have a low number of alternative destinations and a high journey time tolerance.

For this system to make any difference at all to the current system, it would further be necessary to separate all the buildings and have different weightings for routine and special trips for each building. That would be a very large amount of work, as, not only would new data structures have to be created in the code, which would need modifying the code in every place where visiting passengers might interact with a building in any way, but each and every building in every pakset would need to have a whole new set of weightings (one for each class) differentiated for routine and special visiting trips.

While this algorithm might be workable once that huge amount of work had been done, it is not clear what this would actually achieve in terms of emergent properties on larger maps compared with the existing algorithm as modified as I had suggested in the previous post.

Increasing the number of passengers marked as "no route" is not a worthwhile goal in itself: in reality, there is virtually nowhere in the world that people cannot reach by some or other means (eventually), and this has been so for a long time. In centuries past, when there may have been parts of the earth that were simply cut off from other parts of the planet, there were not myriads of people who were staying at home rather than going to some reachable destination because there was no way of getting to (for example) Australia.

It is important to remember that Simutrans-Extended is intended to be able, with the same algorithm, to simulate equally well transport conditions of 300 years ago and transport conditions in modern times. With that in mind, it is necessary, I think, to understand the passenger generation algorithm at a higher level of abstraction. It is intended to simulate, not just that people will go to a local shop rather than a shop on the other side of the country to do their grocery shopping, but that people will tend to support local football teams, will make friends with people whom they can readily contact in the first place, and do business with people who are reachable. A place that is genuinely totally cut off from the rest of the world is not something that many people would have a desire to visit because they would not know of anything there worth visiting - unless it had some unique feature which made it worth visiting in its own right over and above a reachable attraction (presumably discovered by explorers); but this can be simulated by the uniqueness rating system described in my previous post.

Of course, barring barriers created by the sea, anywhere is reachable by walking given a sufficiently long time and the ability to carry provisions or purchase them en route. Thus, for destinations that do not involve a sea voyage, it is entirely correct that there should be no "no route" and all "too slow". Currently, there is a fixed limit on walking time. However, once comfort tolerances are introduced, even this can be abolished, since walking will have a fixed comfort level and thus a long walking trip may exceed some passengers' comfort tolerance. In reality, people did walk a very long way indeed in times past when there was no alternative form of transport.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Vladki

The main reason why I started to think about more categories than commuters/visitors was, that I thought it would reduce the complexity of path finding if it is limited to smaller part of map (for commuting and shopping). But james said it will not help. So it can be dismissed completely. Just it can be simulated by smart distribution of time tolerance. Commuters should generally have lower time tolerance, while visitors higher. However the peak (most frequent journeys), should be similar in both categories (shopping is much more frequent then leisure trips).

Also for short commuting and shopping, there is generally higher willingness to go to alternative destination, than for long sightseenig trip. But still I would count the real occurence in the game. If there would be a stone circle in central europe, many people would be willing to see it instead of Stonehenge...  Of course there could be an option that sets higher time tolerance for curiosities and lower for commercial/industrial, ... oh hey, that's exactly what I wanted to show with commuter/consumer/leisure. Just set that the mean time tolerance is lower for ind/com destination, and higher for res/cur destinations.

jamespetts

Quote from: Vladki on April 03, 2019, 02:53:44 PM
...Just set that the mean time tolerance is lower for ind/com destination, and higher for res/cur destinations.

This is not easy to do with the present system. The way that it works is that passengers are generated at an origin (residential) building, they are (weightedly) randomised to commuting or visiting, their journey time tolerance and number of alternative destinations are set, and then they iterate through an algorithm which, in very simple terms, (1) finds a destination building at random in the map-wide list of commuting or visiting targets (as applicable); and (2) tries to find a route to that target within the passengers' journey time tolerance until either a route is successfully found or the number of alternative destinations reaches zero. A return and (possibly) onward journeys are then generated.

To treat passengers to different destinations differently would require further subdividing the lists beyond the existing commuting/visiting, and this, in turn, would require adding weighting/class data for each of the subtypes to every single building in every single pakset - a huge amount of work. There would therefore have to be a correspondingly huge benefit to make doing this worthwhile. It is not clear at present what such a benefit would be.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

jamespetts

I have just added code to load/save the pathing data with the saved game, which, on brief testing, appears to be working correctly: even small single player games load more quickly with this new system.

Before I integrate this into the production code on the master branch, would anyone who is able to compile the code care to test this in case I have missed anything? I should be very grateful.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

jamespetts

I have now incorporated the changes from the path-explorer-modifications branch to the master branch. In summary, the changes are:

(1) it is now possible to customise (in simuconf.tab and the advanced settings dialogue) how much work that the path explorer does every step;
(2) connexions data are stored per class, so the issue with very high class routes being temporarily lost is now fully fixed in place of the previous workaround; and
(3) the path explorer data are saved with the saved game; this greatly reduces loading time by eliminating the "calculating paths..." phase, but this greatly increases the saved game size, so this is optional (it can be enabled/disabled in simuconf.tab and the advanced settings dialogue).

To give an indication, the Bridgewater-Brunel saved game is currently circa 97MB - with the path explorer data saving, this increases to well over 400MB.

This will be rolled out in the next nightly build. Path explorer data saving will be enabled, and I will monitor how well that this works on the server and whether the larger saved game size causes problems.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

prissi

In clients, the work in a step must be the same, I presume, and is set by the server?

jamespetts

Quote from: prissi on April 07, 2019, 12:21:24 PM
In clients, the work in a step must be the same, I presume, and is set by the server?

Yes, indeed: it is loaded with the saved game, and the advanced settings dialogue does not function when connected to a server as a client.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Ves

Once a couple of years ago we talked about that it did not make any sense to have waiting time coupled with the first stop on a travelers route nor the last stop. Perhaps this would be worth considering again when (and if) the path explorer gets reworked?

jamespetts

Quote from: Ves on April 11, 2019, 09:07:36 AM
Once a couple of years ago we talked about that it did not make any sense to have waiting time coupled with the first stop on a travelers route nor the last stop. Perhaps this would be worth considering again when (and if) the path explorer gets reworked?

The problem with this, as discussed at the time, is that there is no reasonable algorithm that will make any sense of this. The issue of low frequency, long distance services and waiting times cannot easily be solved, as there are an unknown and theoretically unlimited number of short distance trips, each with their own waiting time, before and/or after the long distance trip, and there is no algorithm that can, with any reasonable degree of computational effort in large maps, plan in advance when passengers should travel depending on their route. (Perhaps in 10 years' time if and when computational speed, or even simply the number of parallel threads available, increases vastly, it may be possible for each passenger to compute its trip from scratch based on actual current timings rather than generalised timings, but that is a long way off).

In reality in any event, even if passengers can wait at home rather than at a stop, this is still, in a sense, waiting time: passengers are not where they want to be. This is, I think, recognised in transport planning theory, and was discussed in a video about the 'bus service in Dublin that I posted recently in the randomness lounge board. The complexity that we cannot handle without a massive upgrade in processing capacity and a ground up rewrite of the pathing code is that this waiting time, although significant, is in reality weighed less than time actually waiting at a stop.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

ACarlotti

Quote from: jamespetts on March 29, 2019, 01:53:04 AMA question for A. Carlotti

I understand from another post that you were interested in doing some work on the path explorer sometime soon. May I ask what sort of thing that you were thinking of doing so that I can co-ordinate what I plan to do with what you plan to do so as not to overlap or clash?

The basic idea is that at some point I want to study the path explorer thoroughly and work out how to make it more efficient, either by improving critical points or through substantial changes to the algorithm. So far I haven't looked at it much, and I'm unlikely to be able to study it in-depth for a while, so don't wait for me.

I could, however, try to make some specific local improvements (e.g. to the hash tables).

Incidentally, I hadn't realised how bad the loading time was until just now - I don't think I ever actually tried loading the full Bridgewater-Brunel save before.

jamespetts

That is most helpful, thank you. My short-term work on the development of this, bar any bug fixes, is complete for the time being in any event now, although in the long term the partially comfort based routing system described above would add much to the realism of the simulation and the meaningfulness of the choice between vehicles.

It would be helpful at some point to work out what the most efficient way of storing the data for the five extra sets of paths (the speed/comfort balanced paths for each of the five classes of passengers, although note that the number of classes is pakset dependent). If, having examined the path explorer's data structures, any ideas for this should become apparent to you, then this would be most helpful.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

ACarlotti

So I've done some profiling of the path explorer, and had a look at the code. Unsurprisingly, most of the time is spent in the innermost loop of the path explorer. With an October version of the Bridgewater-Brunel server, I find that over 80% of the loading time is spent inside this loop. Using perf on an optimised build (but still with debugging symbols), I find get the following profile (with annotations in [[]] to try to clarify what each bit of assembly is doing).

       │1649                                                            for ( target_member_index = 0; target_member_index < target_halt_list.get_count(); ++target_member_index )
  0.64 │3825d2:   add    $0x1,%ecx [[Increment %ecx=target_member_index]]
  1.31 │3825d5:   cmp    0x14(%r9),%ecx    [[Compare target_halt_list.get_count() to target_member_index]]
  0.02 │3825d9: ↓ jae    382650 <path_explorer_t::compartment_t::step()+0x39fe>  [[Jump if condition failed]]
       │1653                                                                    if ( ( combined_time = working_matrix[origin][via].aggregate_time
  5.31 │3825db:   mov    0x20(%r14),%r12    [[Set %r12 to working_matrix ]]
  8.72 │3825df:   mov    (%r12,%rsi,1),%rax   [[Set %rax to working_matrix[origin] ]] [[%rsi is loop variable]]
       │1656    _ZNK10vector_tplItEixEj():
       │284                             return data[i];
  0.54 │3825e3:   mov    %ecx,%edx        [[Set %edx to target_member_index]]
       │286     _ZN15path_explorer_t13compartment_t4stepEv():
       │1651                                                                    const uint16 target = target_halt_list[target_member_index];
  1.26 │3825e5:   mov    0x8(%r9),%r8     [[Set %r8 to target_halt_list.data]]
       │1654                                                                                                             + working_matrix[via][target].aggregate_time )
  5.47 │3825e9:   movzwl (%r8,%rdx,2),%r8d   [[Set %r8d to target = target_halt_list[target_member_index = %rdx]
  8.73 │3825ee:   lea    0x0(,%r8,8),%rbp    [[Set %rbp to target*8]]
  0.53 │3825f6:   mov    (%r12,%rdi,1),%r12  [[Set %r12 to working_matrix[via] ]]
       │1653                                                                    if ( ( combined_time = working_matrix[origin][via].aggregate_time
  4.36 │3825fa:   mov    (%rax,%rdi,1),%edx  [[Set %edx to working_matrix[origin][via] ]]
17.27 │3825fd:   add    (%r12,%r8,8),%edx   [[Add working_matrix[via][target].aggregate_time to combined_time=%edx]]
       │1655                                                                                            < working_matrix[origin][target].aggregate_time                    )
  6.67 │382601:   add    %rbp,%rax   [[Set %rax = & working_matrix[origin][target].aggregate_time]]
       │1653                                                                    if ( ( combined_time = working_matrix[origin][via].aggregate_time
  1.16 │382604:   cmp    %edx,(%rax)
33.27 │382606: ↑ jbe    3825d2 <path_explorer_t::compartment_t::step()+0x3980>   [[Condition is false - go to next loop iteration]]


It looks like one of the more computationally expensive bits of the code is the jump back to the beginning of the loop - this is probably due to branch misprediction interrupting the instruction pipeline. I suspect that most of that 33% (of total load time) could be saved by a combination of unrolling the loop, reordering the code and employing Duff's device. I reckon that we could achieve savings of around 25% of total load time by such optimisations.

I also took some counts of how long was spent at each level of the loop, as well as a count of how many iterations of the inner loop are saved by the check for origin and destination groups that would use the save line/convoy at a via point.

catg: 0
g_class: 4
via_index: 4328
origin_cluster_index: 8305
target_cluster_index: 20647
not skipped: 14956
origin_member_index: 13084700
target_member_index: 10934293135
Updates: 84474290
Total skipped: 40940390830
working_halt_count: 5012


catg: 1
g_class: 1
via_index: 428
origin_cluster_index: 1205
target_cluster_index: 4596
not skipped: 3404
origin_member_index: 677711
target_member_index: 73413270
Updates: 12201296
Total skipped: 477734938
working_halt_count: 2212

In doing so, I discovered that the path explorer is being run (on startup, at least) 5 times for every goods category. This is clearly inefficient, but since the inner loop when searching for mail routing has less than 1% of the iterations used by the inner loop for passenger routing, this does not make a significant difference.

My purpose in producing the above statistics was to use it to consider whether alternative path explorer algorithms might be more efficient (specifically, how would an optimsed variant of Dijkstra compare to the current variant of Floyd-Warshall). I'll give that more consideration at some later point.