Modelling private car traffic without breaking the CPU bank

jamespetts · April 04, 2012, 11:47:26 PM

Thank you all for your prolific input - this is most welcome! I don't have time to reply fully, but one or two additional points of consideration/questions for Merry. Firstly, what of the situation where the "ports" as you describe them change through players building extra roads, or deleting roads, or doing some combination of both near city boundaries? Exactly what would trigger a recalculation of ports, and exactly how (in abstract algorithmic terms) would a city check for ports?

Secondly, how would a system like this evolve to be able to have the in-game representations of citycars actually following a route from their origin to their destination?

Thirdly (this, I think, has been touched on by others), how would the principle of using Manhattan distances in cities accommodate the possible differing levels of congestion in cities (and on different roads in cities) and the different speed limits of different roads in cities?

sdog · April 05, 2012, 12:19:08 AM

QuoteSecondly, how would a system like this evolve to be able to have the in-game representations of citycars actually following a route from their origin to their destination?

the best way? by not having vehicle follow it, but generate random city cars based on the trafic density at that spot.
Normalize the city population to the largest city. For every connection you get add the sum of the normalized city populations of both ends of the journey. That way you get the distribution of vehicles on the different ways. (This also provides you with a congestion index, if you want so) From the distribution of trafic, you can get traffic densities you can base the number of private cars represented on. This is however, i think, still way too much for something very cosmetic. Actually routing private cars would be a complete waste of resources.

In general i think, a much much simplified model for this whole problem has to be found. Most approaches are quite overkill. Most likely for any "well behaved" scenario a global statistical approach would provide the same result for reduced ridership.*

What you can do there, is to just test this. Use the routines very heavy on computing on a model system and compare the car rider preference to the public transport use.

(You remember the discussion on complex systems we had a while ago on google or facebook? This is one of those examples, where one would try to get the simple mechanism describing the overall effect of a system which is microscopically highly complex.)

*This gets even more problematic, as we don't have any idea if the basic assumption, that a part of the car owners prefer to take the train when it is faster than their car journey is even viable. (i have some doubts there) But we had this discussion already a year or two ago, i think.

A different approach, in line with what i said before. The idea is to get a global factor for ridership reduction based on car ownership or for a more refined approach a time the public transport has to compete with.

Pick randomly, weighted with relative city size, two cities.
Calculate the distance.
Put it into distance classes, from your pax destination generation.
Run routefinder and get the time it would take.
provide the avg speed for the class.
do this a few dozen times and average the speed for each distance class.

Compare avg speed of a line with the car owner average speed. If that is too small, recalculate.

This might be even more realistic than the more refined method. If i should consider to go from Munich to Hamburg by car, i'd divide 1000 km by 130 km/h and get 7.7 hours. The former is the distance i assume and the latter my expected value for the speed.
I am not a fully informed market participant. If i was more careful i'd go to google maps, this tells me it is actually 780 km and takes 7.3 hours.

(fyi the train takes 5.6 hours, does anyone have ridership numbers?)

jamespetts · April 05, 2012, 12:32:22 AM

Sdog,

again, quickly for the moment, a normalised model will not work, as we need absolute numbers to generate realistic traffic congestion levels: the number of private cars for any given level of usage should have a realistic and determinate relationship to the number of 'buses required. Further, following routes is important because only that model will generate the right amount of traffic congestion (measured in terms of stationary vehicles) in the right places: if vehicles do not go to the right places, they will not block up the right roads. The public player's task in designing a good road network must be meaningful: the idea is for this to be as much a part of the game and transport design as the private players' 'bus or train networks. This is why I don't agree with the idea of radical simplification of this: certainly, we don't need a model that is any more simple than Sim City 4, for example.

As to the *ed point, there is no general assumption that private car owners take the train when it is faster: the decision as to whether to use a private car or public transport is a multi-faceted decision, of which only one element is the relative journey times of both modes. This is realistic.

sdog · April 05, 2012, 12:57:45 AM

Quoteagain, quickly for the moment, a normalised model will not work, as we need absolute numbers to generate realistic traffic congestion levels: the number of private cars for any given level of usage should have a realistic and determinate relationship to the number of 'buses required. [...]

James, that is exactly what the system i described provides. It does not matter for this if the car went all the way or was generated around the corner. Only important for this is that there are eg. 20 cars per minute on the road between Shrewsbury and Telford.

Same holds for the congestion caused. If you spawn vehicles for each section of road with the same rate as the utilization would be if they would go straigth, you get exactly the same number of vehicles on the road as if they would drive the whole way. It would matter if there was only one car, as soon as we have plenty of cars, they become indistinguished objects.

QuoteAs to the *ed point, there is no general assumption that private car owners take the train when it is faster: the decision as to whether to use a private car or public transport is a multi-faceted decision, of which only one element is the relative journey times of both modes. This is realistic.

I know you are not doing the comparison right away. But we are doing this whole thing in a rather arbitrary way, this is absolutely ok ... But having a very simple and rough theoretical model on one side, and having massive effort to calculate something for it in a second step is usually not a very good practice in simulation.

ps.: i've added quite a bit to my previous post.

ӔO · April 05, 2012, 01:27:33 AM

Sorry if I missed it, but wouldn't it be simpler to calculate average speed over roads if there is a table of values that the calculator can refer to?

There is only so much a road that is X wide with Y speed will be able to throughput before its top speed comes to a crawl due to congestion. Faster roads will have more capacity and slower roads will have less, so it shouldn't be too hard to come up with a chart that the route calculator can use to tally up total times between two points, instead of calculating from scratch.

With my GPS, it gives three options. Shortest, fastest and economical(?). Obviously, we're only concerned with the first two. I should say, most people don't take the fastest route and instead just take the most direct that uses major roads.

I live on such a street and there may be something like 1 person who takes a bypass through residential street for every 15~20 that will just stay there, stuck in traffic. However, on some major highways with major roads running parallel, or another major highway running parallel, it's around 50:50 with a preference to highway.

ӔO · April 12, 2012, 05:01:53 AM

what I'm finding, is that when the game is freshly loaded, it is quite responsive. It only eats around 60~80% of 1.86Ghz and only around 600MB of memory.

It then slowly becomes sluggish and unresponsive, as it starts to eat up more memory and more CPU power. At the end of the first month, it's not too bad, but at the end of the second month, it's quite bad. It seems to max out at around 660MB

Carl · April 12, 2012, 06:44:02 AM

That kind of change in memory usage is probably to be expected as point-to-point journey time hashtables (etc) are populated. But it shouldn't make for a difference between responsive and unresponsive, which is strange...

jamespetts · May 04, 2012, 12:51:26 AM

Having left this topic for a while, a few thoughts on the way forward.

Firstly, the problem is sufficiently complicated that it cannot be dealt with by the next major release. For the time being, server games or other large maps will have to have assume_everywhere_connected_by_road=1.

Secondly, any solution to this issue needs to take into account the eventual plan to have city cars actually driving along defined routes, and congestion being calculated by the proportion of a month that vehicles are stationary on any given road tile. That would enable per-tile congestion measuring, which would enable a more sophisticated means of measuring congestion in cities. It would also mean that the actual time that it takes to drive a particular route could be measured rather than guessed at, and that that measurement would take full account of congestion. The idea would be that all journeys originating in City A starting from a radius of X tiles around point Y and ending in city B use one and the same route (to a specific point in city B) for journey time per tile calculation and citycar routing purposes, but calculate their actual journey time using the existing system of journey time per straight line tile from the actual exact origin point to the actual exact end point.

Thirdly, this leads to a consideration of how to achieve this. There have been a large number of suggestions. I am currently leaning towards using Dijkstra's algorithm to find all routes from a given point in each city to a randomly selected point in each reachable destination city simultaneously, combined possibly with concurrent multithreading. This would take more computational effort than finding only a single route between cities (from a single point in each city to a single point in each city, rather than from multiple points in each city to a single point in each city), but that increase will be linear and, if the radius of the origin points is wide enough, then the increase will be relatively modest (perhaps on average 4 or 5 times the number of routes that need to be calculated for having a single route for each city pair).

However, for that, a number of things need to be considered, preferably before implementation. Firstly, just how computationally intensive will 5 or 6 one point to many Dijkstra's algorithm searches be compared to the existing city to city A* set of searches? I know that the use of a one to many Dijkstra's algorithm search will scale in its computational intensity linearly rather than exponentially with the number of cities, but does anyone have any idea whether even this will be manageable (especially with multiple origin points per city, and therefore multiple searches per city) for, say, 675 cities?

Further, there is the issue of network synchronisation. For concurrent multi-threading to work in network mode, as indicated above, the only workable solution is to have the server do all the work, and push the results to the clients in the same way that player interactions are pushed from server to client. However, this gives rise to the issue of whether this will be too bandwidth intensive. For the full private car routing, using multiple start points in a city and having private cars driving along defined routes to their destinations, the whole route would have to be transmitted from the server to the clients. This would be 32 bytes (2 x 16 for a koord object) per route tile, plus some overhead (including an 8 byte control character for the command, 32 bytes for each start and end tile, and, say, 192 bytes for general information about the number of vehicles using the route, when it was created, and, the journey time per tile and so forth, plus probably another 16 bytes worth of general overhead for each command. For a 256 tile route, this would be something in the order of 8,456 bytes per route. If each city, on average, generated 6 routes to each other city, a refresh for each city would amount to 50,736 bytes per destination city, or 34,246,800 bytes (32MiB) per full city refresh. Multiplied for all 675 cities, this would amount to 22,045MiB or 21.5GiB, which seems to be an awful lot, even spread over time. Do these calculations appear correct on the face of them? If so, is there any way, does anyone think, to mitigate this large amount of data? Or alternatively, will a single threaded Dijkstra's search suffice for performance? That will, I suppose, depend on just how much of a saving a one to many search is over many one to one searches.

Incidentally, running the calculations again for a Dijekstra's algorithm version of the current model, that is, without an actual route being transmitted (resulting in routes with an overhead of only about 264 bytes), the numbers are as follows: 1,584 bytes per destination city, 1,069,200 (approx 1MiB) bytes per refresh of a single city, or 721,710,000 bytes (approx. 688MiB) for a refresh of all cities. Even with the simpler system, therefore, this is far from ideal - a CD's worth of data would need to be sent for each time that every city on the map was refreshed (if my calculations are correct).

On reflection, perhaps this will only be practicable if the performance can work well single-threaded (or if there is some way of achieving parallel, rather than concurrent, multi-threading in the route-finding algorithm that achieves a substantial saving, but I am doubtful of that).

Any thoughts on this topic (both in relation to multi-threading and generally) would be most welcome.

chicken · May 04, 2012, 01:49:25 AM

Surely a route could be encoded in a much smaller space. How about an initial koord, and then a sequence of bits, representing each step? 32 bytes + 2 bits * 256 + 16 bytes = 112 bytes.

jamespetts · May 04, 2012, 09:44:39 AM

I'm not sure that I understand - how would that sequence of bits represent a tile in two dimensional space?

Combuijs · May 04, 2012, 10:41:59 AM

You start with a tile with full coordinates (might be 32 bytes?). Then you need 4 values (2 bits) that tell which tile is the next one: left, right, up or down, for each tile in your route (256*2) for a route that is 256 tiles long. Don't know what these 16 bytes are doing there at the end.

jamespetts · May 04, 2012, 11:46:07 AM

Ahh, I see. However, given that even without any route information, refreshing a map's worth of cities will still require the transfer of a CD's worth of data to each client, this may not be a realistic solution in any case (if my calculations are correct).

jamespetts · May 04, 2012, 10:34:04 PM

I think that I have devised a way to multi-thread this without the server having to do all the work and push the results to the clients, by using a parallel multi-threading algorithm rather than a concurrent algorithm (although some consideraiton of the implementation details is still necessary, any thoughts on which would be much appreciated).

The basic idea would be for each city to begin a refresh of all of its routes in a separate thread, but keep the results (including interim results as all the various routes are found) in a separate set of hash-tables (or some sort of collection object) not actually used by the game. Only when all the routes for that city has been calculated on all clients do all clients simultaneously replace the live data with the fresh data in waiting generated by the latest refresh.

The tricky part would be how and where to check whether all clients have in fact completed the refresh. Perhaps something could be put in stadt_t::step() (and perhaps a mechanism to cause it to check only every 256 steps to reduce network overhead) to perform the checks. In a non-network game, the game would simply check whether the local route checking thread(s) had finished, which would be indicated by those threads having set some or other variable. (What might work is one thread from each origin point in the city, the number of threads being recorded, and each thread incrementing a counter when it has finished, which could be compared with a counter keeping track of the number of threads started). In a network game, each worker thread on the client would have to send a message to the server when it had finished, and the server would keep a record of all client threads and check to see whether all threads from all clients in a given city have completed, and, if so, send a command to the clients to swap the live data with the provisional data. The clients, in turn, would only swap the data in network mode when a command is received from the server, rather than checking for local threads having finished.

One complexity is how to handle the situation in which a client leaves or joins between a city refresh starting and all threads completing. A simple counter would be difficult, as it would not determine whether an exiting client was one that had or had not finished the refresh. It would not be possible to rely on the quitting client to tell the server whether it was one that had completed a refresh of any given city or not (so affecting whether the server subtracts 1 from the count of completed clients or not), as the client might disconnect due to a network error, local power outage, etc., and suddenly stop sending the server information at all. Any thoughts on how to deal with disconnecting clients would be appreciated.

As to joining clients, one option is for them to download the completed interim information from the server if the server itself has completed the refresh that is in progress on other clients (in which case, all the clients might as well download that data, too), but even this might be quite a download overhead. Alternatively, connecting clients could be told to start a refresh on those cities straight away, but in the meantime use the current live data, which is probably the best way of proceeding. I should be interested in whether anyone has any better ideas on this question, however.

Overall, this is quite a complicated if possibly powerful solution to the problem, and is a lower priority than economic balance, so might not be implemented for a while unless somebody would like to have a go at it. However, I think that it is very helpful to have these things planned. In the meantime, I should be very interested in any thoughts on this particular implementation, especially as to whether there might be a more efficient way of doing things, or whether there are likely to be problems which I have not foreseen in the above outline.

kierongreen · May 05, 2012, 08:43:49 AM

A few points:
How will you deal with stale routes?
You might need checks and/or thread locking to ensure that grounds are not deleted by one thread while another is calculating a route across it.
If you have one thread continuously updating routes this means that simutrans will use significantly more CPU time - which may mean that on laptops for example battery life is adversely affected.

jamespetts · May 05, 2012, 11:23:00 AM

Kieron,

thank you for your thoughts. To respond individually:

(1) stale routes will be marked as such, but will continue in use until the threads for replacing them (or, in a network game, all clients' threads for replacing them) have finished;
(2) I shall have to give this one further thought - thank you for pointing it out (do you or anyone have any suggestions?);
(3) this feature will inevitably be computationally expensive, although, if it works the goal will be most worthwhile - however, it will be able to be disabled by the always_assume_everywhere_connected_by_road=1 setting, as at present.

jamespetts · December 10, 2012, 10:34:50 AM

I have not had the chance to work on this rather complicated issue since it was last discussed back in May of this year. However, I have recently come up with an idea which would allow a much quicker and simpler solution to be applied to network games, relying on multithreading. To recap, the problem with multithreading has been that, in online games, there is no reliable way to keep a multithreaded game synchronous if the threads are run in parallel on the client and on the server, and, if the background threads are run on the server alone, pushing data to the clients as they go, those data will quickly saturate the whole bandwidth.

The solution, I think, is a modified version of the second model - the threads to create the private car routes running only on the server. However, instead of pushing the data to clients as it goes, the server would store the updated data in unused shadow vectors until the game is next loaded/saved (for example, when a client joins). Then, the server would swap in the shadow data to the main running data immediately before saving, so that, on loading, those updated data are available to the clients and active on the server. The bandwidth would be used far more efficiently, as the data would be compressed, and only sent at intervals. Although this would result in some delay between a change to the road network and its recognition by the game for private car routes, this delay would not be great in proportion to the slow pace of game time, and, with the frequency of players joining/leaving, would not cause noticeable problems. Indeed, there is something to be said for autosaving every hour or two in any event by means of a command line shell script to prevent any server crashes from losing too much players' work.

On a single player game, multithreading could update the data immediately without delay.

The implementation of this would have to wait until the merge with 112.x has been finalised, as that is the version with the multithreading code, but I should be most interested in any views on this subject in advance of attempting to to implement. Also, does anyone know whether it is possible to set thread priorities using pthread? It would be useful to set the priority of the thread that finds private car routes lower than the priority of other threads on the system.

prissi · December 10, 2012, 12:36:18 PM

But currently only one map is sent to the clients. You would need to transfer the map then to all clients connected. You would also need two independent random counter.

Additionally server usually have much weaker CPU than clients as CPUs are shared. Even if you have a single CPU, it could mean that this is rather a single core out of 16, and hence features like turbo boost and similar are disabled. If you can do a hard reset of your server, then the chances are high this is really and independent dedicated server. But those come still with a 100 EUR/$/GBP tag per month.

The display is anyway threaded and is not the largeast CPU consumer for developed games is standard and likely also exp.

jamespetts · December 10, 2012, 07:37:19 PM

Quote from: prissi on December 10, 2012, 12:36:18 PM
But currently only one map is sent to the clients. You would need to transfer the map then to all clients connected. You would also need two independent random counter.

Hmm - I am not sure what you mean about one map being sent to clients. Obviously, only one saved game (including the map, etc.) is sent to each client; what I am proposing here does not need this to change. The idea is that the server would update its active private car routing table from the shadow private car routing table updated by the background thread(s) immediately before saving (indeed, called from each town's rdwr(file) method), and then transmit the whole saved game, including this updated table, to all the clients.

Or do you mean that the presently connected clients do not actually receive data from the server when a save/load is performed? If so, that is an interesting point. I wonder how easy that it would be to serialise and compress just the private car routing tables...?

As for the random counter, I don't think that this is affected, since the calculation of the private car routing tables does not need any random number generation.

QuoteAdditionally server usually have much weaker CPU than clients as CPUs are shared. Even if you have a single CPU, it could mean that this is rather a single core out of 16, and hence features like turbo boost and similar are disabled. If you can do a hard reset of your server, then the chances are high this is really and independent dedicated server. But those come still with a 100 EUR/$/GBP tag per month.

The display is anyway threaded and is not the largeast CPU consumer for developed games is standard and likely also exp.

I am not sure what most Simutrans servers are run on, but the Bridgewater-Brunel server currently runs on one of these. I currently have it set up with one dedicated core, 50Gb of hard drive space and 3Gb of RAM, but I could add a second dedicated core for an extra £6/month.

Incidentally, do you know whether pthread allows one to set the priority of a background thread to a lower priority than the main thread? It would be useful to be able to have the private car route population thread(s) run in a lower priority.

prissi · December 13, 2012, 01:25:04 PM

When another player joins, the map is only sent to the new player. The other clients just do a local save/load operation. If you have some hidden data on the server, this is no loanger possible, and all clients need to recieve at least this hidden data.

The VPS guaranteed performce is usually something around a pentium 600 or so. Just remember, current server processor have 16 core (when you count hyperthreading) per unit. Often those cores are not even exclusive, studies show that on avarage 40-60 virtual servers are typically joining one core, since most of them do nothing. (The hoster has some clever management to pair highly loaded with less loaded cores.)

But the VPS performance can strongly vary. In any case, it will not exceed an Intel Atom processor very much. At best relayable netbook perfomance.

jamespetts · December 14, 2012, 01:20:59 AM

Thank you for this. I see the logic behind that way of doing things, but it does mean that my latest idea won't work. Hm - back to the drawing board on that one, I fear. If only Knightly were still working on Experimental...

As to VPS performance, I think that I get a dedicated core on a 16-core rack server. The server seems to cope very well with my very CPU-intensive network game with vast numbers of convoys each with unoptimised physics equations, so a second core could usefully do a lot of work calculating routes in the background. The difficulty would be how to transfer them to the client without clogging bandwidth excessively.

jamespetts · January 02, 2013, 05:05:37 PM

A somewhat fresh perspective on this set of problems occurs to me, although it provides no easy solution. I was looking at a book on transport economics last night, and was struck by the importance of transport for the pattern of developments, and the very precise relationship between density, urban development, passenger transport and distance between homes and workplaces (for reference, the book is "Applied transport economics" by Stuart Cole). It would be good to be able to model some of this in the town growth algorithm. One way of modelling this is by associating particular residential buildings with particular destination buildings as workplaces. We can calculate whether to build a workplace at a given point by whether there are spare workers living a transportable distance away, and calculate whether to build a home at a given point by whether there are workplaces needing workers a transportable distance away.

Putting aside the details of how we calculate what is within a transportable distance or not, what counts as "spare" jobs/workers and how we generate "spare" ones, modelling things in this way suggests a possible change in the basis of passenger generation from randomly assigning them destinations when the buildings are stepped to having each building with a pre-assigned list of destinations, a workplace, a leisure activity and a miscellaneous (shopping, going to the bank, etc.). Something similar is done in Sim City 4, where each building is assigned one or more destinations and the routes to/from it recalculated only periodically. If we make all trips return trips, we would potentially only need to step residential buildings for passenger generation (mail generation would be different, but would not need private car route calculation; query however, whether this would miss the realistic simulation of commercial to commercial passenger transport and, if so, whether that would be significant).

Using this model, we could in principle bypass the calculation of routes between cities entirely (which, I think, is an unhelpful abstraction in this context) and focus on calculating routes between buildings. Because we do not need a set of journeys for every possible pair of buildings to be calculated regularly, but just, say, two to five journeys from each residential building to its destination and a return journey (the return journey, in the case of private car routes, can simply be a mirror of the outward journey and not recalculated) per recalculation.

In the current Bridgewater-Brunel online game, I have estimated the current number of buildings overall as something in the region of 107,000 (by taking the population of one city, dividing it by the number of buildings, to get a figure close to 21.8, and then dividing the region population by that number). This game is still in the 1860s, so I should expect a greater number of buildings by the time that the game reaches the modern era. If we assume that we need to be working with a set of 250,000 buildings, we should have a solid basis for the calculation. Of those, perhaps half are residential buildings (they seem to number more than commercial or industrial buildings), giving a total of 125,000 buildings from which routes need to be calculated every calculation cycle. If we work on the basis that private car ownership, like destination, is pre-assigned per building, only circa 80% of those will, in the modern era, require private car route calculation, or something in the region of 100,000. This compares with a figure of in excess of 225,000 sets of private car route calculations required for the existing method on the previous Bridgewater-Brunel saved game. (The two are not exact comparisons by any means, as the latter had reached the 1970s at the time of writing the original post and had more cities, but it is significant that towns in later eras will increase their population in substantial part by increasing the level as well as the number of buildings, so the number of buildings increases less than the population as the game progresses, and the number of cities was far short of an order of magnitude different).

So, the fixed destination per building system has the potential to yield some significant reduction in the calculation overhead required whilst at the same time making the routes far more accurate and able to be followed by actual city car graphics, allowing per tile computation of congestion. Also, this method means that the increase in the number or size of cities will only increase the amount of private car route computation in a linear fashion rather than the exponential increase of the present all city pairs system.

However, this reduction is not by itself enough. 100,000 routes will still take an average of 5,000 seconds (100,000 x 50ms), or 83 1/3 minutes for a whole set recalculation, compared with 189 minutes for a full set recalculation with the number of pairs quoted in the original post (being 227,812). Using figures comparable with those given in the first post, if we assume an annual recalculation cycle, this equates to a computational time requirement of just under 7 minutes per game month, compared with around 15 for the original set - and this assumes a fast computer such as my i7. Furthermore, this is assuming only one route per building: if we want to have multiple routes (one to work, one for leisure, one for miscellaneous, for example), this will treble to 300,000 routes or 21 or so minutes - worse than the original example.

The task can be parallelised quite easily, as it happens, as the calculation of one private car route does not affect the calculation of any other private car route. So, every x number of steps (or perhaps every x number of ticks), the route recalculator could be called. It would calculate a pre-determined number of private car routes marked to be stale, then stop. During recalculation, it would set a number of threads running (perhaps 4 or so), each of which would calculate x number of routes each. The main thread would also be calculating routes during this time. Only when all running threads have finished would the main route recalculator method (itself running in the main thread) complete, allowing other parts of the program to execute. This would have the effect of preventing the recalculation mechanism from having an adverse performance impact on the game: taking an unacceptably long time would just make the private car routes all out of date rather than making the game unresponsive. However, the possibility of having out of date private car routes does not sit well with the system of having growth based on whether a site for a building can be found that is within reach of a workplace: indeed, I cannot see how that latter system can be made to work without always being able to have up to date private car routes (which would have to be calculated theoretically from an undeveloped tile before any development takes place, further increasing the number needing to be computed).

How often, then, in game time would routes end up being recalculated if we worked in this way? With 21 bits per month, about two game years tend to pass for every one real time day. 300,000 x 50ms = 4.17 hours: for every complete recalculation cycle, therefore, four hours and ten minutes would have to be spent by the server (and all clients connected at the relevant time), spread out over the whole day, or about 35% of the total computational time spent by the server if the routes were to be refreshed once per game year. Switching to 22 bits per month would reduce this to 17%, and having a second core in the server would reduce this further to 9%, but even this amount still seems somewhat excessive.

More efficient code is still necessary, therefore. This is where it gets tricky, as more efficient in this particular context means much more complicated. If we are storing routes in any event, we might as well add a reference back from a road tile to the route(s) that pass over it. This will add a vector of pointers per road tile worth of memory. This would make it possible to check at any given tile on an A* search whether there is already a route to the destination from that tile, preventing duplicating routes. However, I do not know how well that this works with A*, as, from what I understand, the algorithm can iterate over tiles on a sub-optimal route before finding the optimal route.

In any event, if the destination is a specific building, this will not save much in terms of execution time, as the chances of hitting another route for a specific building are quite slim. However, it is quite possible that the path finder would hit a route for a neighbouring building early in its search. The trouble is that it would be very difficult to find a reliable algorithm that could make use of this fact. What should count as "neighbouring"? How do we know whether the neighbouring building is anywhere near, in transport terms, our destination building (they may be opposite sides of a river, for instance)?

I cannot think in principle of any other type of optimisation over and above A* that will work for a building to building based search than, as mentioned above, a system that checks each tile over which it iterates for already calculated route(s) to the destination. (This also throws up the possibility that calculating some routes would be much shorter, whereas calculating others would take no less time, which would potentially reduce the possible gains by parallel multi-threading, as at least one of the threads could be stuck with a set of unoptimisable routes; this could only partially be solved by giving all threads more routes to find per parallel run). A Floyd-Warshall or other network explorer centralised all points to all points system would not, in my limited understanding, assist, since we need to calculate the paths between a far reduced set of nodes (unless running Floyd-Warshall (etc.) for the entire set of buildings for road connexions, giving a total of 11,449,000,000 (eleven point four American billion) total connexion pairs, is faster than running A* for a set of 300,000 connexions).

Can anyone suggest anything that may assist here? I am very keen to be able one day to have realistic traffic and development simulation in Simutrans, and I should like to know whether it can be achieved before we all have desktop computers powerful enough to run A* between hundreds of thousands of points within a reasonable time on a Simutrans map or whether it can be achieved through more efficient coding sooner rather than later. Any thoughts on (1) how much that a search for existing routes on each tile would help; (2) whether any other optimisations of which I have not thought might assist (and, if so, by how much); (3) whether the memory usage for storing around 300,000 private car routes would be excessive in any event; and any other connected matter that might be of assistance would very much be appreciated.

Edit: I have come up with a possible optimisation simple enough for even me to code, although I do not know whether it would provide enough of a saving when taking into account its own overhead.

For each route needing to be calculated, each tile would be checked to see whether it also contains a route to the destination building. If so, the routes (the current incomplete route to the tile with a route to the destination, and that part of the route from that tile to the destination) would be concatenated (by copying tiles, rather than by pointing from one to another). This would require storing on each tile not only a pointer to the route, but an integer representing which tile number in the route that that particular tile is, allowing for easy concatenation.

If there is no route to the destination building, check to see whether there is a route to the city in which the destination building is located if the straight line distance between the current tile and the town hall of the city in which the destination building is located is more than X times (perhaps 2) the distance from the town hall in which the destination building is located and the destination building. If these conditions are fulfilled and there is a route to the city in which the destination building is located, copy-concatenate the found route, iterating over each tile to check whether (1) that tile contains a route to the destination building; or (2) that tile is within the destination city. If it has a route to the destination building, stop copy-concatenating the first route, and apply the same procedure from there as in any other case where a route to the destination building is found. In the second case, if a tile within the city containing the destination building is reached, calculate a route from there to the destination building, again checking on each tile to see whether a route to the destination building can be found on that tile and copy-concatenating if so.

This would work in principle, I think, but what I don't know, and have no way of taking a sensible guess about, is whether the optimisation would suffice. Would all that checking for routes on each tile (iterating over all the routes on that tile in the process) take more CPU power than saved? Would the whole thing take too much memory?

I should be extremely keen to hear from anyone with more programming experience than I as to whether this method would work to produce the optimisations sufficient to make the building to building private car route finding model described above workable.

jayem · January 02, 2013, 11:02:46 PM

Would it be possible to statistically hide the journeys for the lots of car journeys. It would mean there might be some odd effects on a single journey but you should get congestion at the right places, and behaviour that looks right over several tiles.

You could model Factories/Shops and Attractions as absorbing cars (valleys of car potential), and Houses as (hills) emitting them to neighbouring roads.
In each cycle the 'car potential' of a road tile would be something between the previous-average of the surrounding tiles and it's previous value
(slight care needed as cars don't get destroyed mid journey but road type should have some effect).

A graphical car would then move 'down hill' making a random choice based proportional to the gradients (and hence always getting nearer a destination).
An anti car (representing a return journey) would move uphill.
If a hill gets too high you need to build another factory, (or lose a house), if a valley gets too low you need to build another house (or lose a factory)

That would be O(N) time-wise and data-wise which would be good. Although the constant factors might well be higher to spoil that.

I think it ought to behave moderately sensibly over simple complexes, although Factory-Town-Town-Factory would lead to little traffic between the towns. You could possibly sort that by having houses and factories emmiting into random piles, so there's a chance one town has a surfeit of Traveller-A's and a deficit of Traveller-B's.

jamespetts · January 02, 2013, 11:19:53 PM

Hmm, this would model congestion in a very similar way to the original Sim City (now open sourced as Micropolis). This is fine for a congestion modelling algorithm, and also suffices to check whether residential areas are connected to somewhere useful, but in Simutrans, we need to check whether people can get to the specific places that they want to go, rather than just anywhere. The private car routing system has to fulfil the function of determining "can the people from building A get to building B within their journey time tolerance by private car?", as well as the function, "how much congestion is there on road tile X?", and have a realistic set of feedback mechanisms/relationships between the two.

Bear in mind also that the passenger generation algorithm in Simutrans-Experimental needs to be able to run a check to see whether passengers can get to the specific place that they want to go by private car within their tolerance. Currently, on larger games, one has to enable the "assume_everywhere_connected_by_road" setting to have adequate performance, which will just check whether the particular packet of passengers has access to a private car and make assumptions about the journey time based on the speed of the private cars currently available in the game and the speed limit of the current default inter-city road (or intra-city road if the journey is entirely within a town). To be a step forward, a private car routing algorithm would have to give an accurate network based answer to the question, "can packet of passengers X get to its destination within its journey time tolerance?" rather than make assumptions. What is really desired is a system that will model private car transport in as much detail as transport by other means.

What I am most interested in at present is some idea of whether the model that I proposed in the thread above has a chance of doing this within reasonable parameters of performance. If you or anyone else has any ideas about that question, I should be most interested indeed.

Thank you very much for your input, though: it is most appreciated.

inkelyad · January 03, 2013, 04:47:01 AM

Quote from: jayem on January 02, 2013, 11:02:46 PM
You could model Factories/Shops and Attractions as absorbing cars (valleys of car potential), and Houses as (hills) emitting them to neighbouring roads.

In other words:
Each attraction/factory is 'heat source', houses are (small) heat sink. Now we can use numerical heat transfer model. Such models are build for multiple processors.

Quote from: jamespetts on January 02, 2013, 11:19:53 PM
This is fine for a congestion modelling algorithm, and also suffices to check whether residential areas are connected to somewhere useful, but in Simutrans, we need to check whether people can get to the specific places that they want to go, rather than just anywhere.

Quote from: jamespetts on January 02, 2013, 11:19:53 PM
To be a step forward, a private car routing algorithm would have to give an accurate network based answer to the question, "can packet of passengers X get to its destination within its journey time tolerance?" rather than make assumptions.

It is quite possible. You need to model separate 'heat plate' for each attraction/factory.
As time go catchment area for factory will grow => more calculations.
But number of factores/attractions should go down too (factories tends to be bigger over time).

jamespetts · January 03, 2013, 10:49:14 AM

Quote from: inkelyad on January 03, 2013, 04:47:01 AM
It is quite possible. You need to model separate 'heat plate' for each attraction/factory.
As time go catchment area for factory will grow => more calculations.
But number of factores/attractions should go down too (factories tends to be bigger over time).

I am not sure that I fully understand. How would this integrate with the current passenger generation system which generates specific packets of passengers at specific buildings wanting to go to other specific buildings (often in other towns) and needing to know whether they can find a route there either by private car or player transport, and, if by both, which is preferable? We need to be able to simulate the exact same journeys that passengers make by player transport being made by private car if the passengers make the private car modal choice.

inkelyad · January 03, 2013, 01:43:56 PM

Quote from: jamespetts on January 03, 2013, 10:49:14 AM
I am not sure that I fully understand. How would this integrate with the current passenger generation system which generates specific packets of passengers at specific buildings wanting to go to other specific buildings (often in other towns) and needing to know whether they can find a route there either by private car or player transport

Each cell has value 'how many(in month) passengers have reached to here from heat source(factory) by private car'. If it zero -> there is no path via private car or said path is too long.

It is same as 'can the people from building A get to building B within their journey time tolerance by private car?', where building B is factory.

Quote
, and, if by both, which is preferable? We need to be able to simulate the exact same journeys that passengers make by player transport being made by private car if the passengers make the private car modal choice.

Uhh... sorry, I was thinking about another decision function. Not 'which transport is faster?', but 'which transport give me better probability to reach destination in time?' It is not same.

jamespetts · January 03, 2013, 02:28:07 PM

Quote from: inkelyad on January 03, 2013, 01:43:56 PM
Uhh... sorry, I was thinking about another decision function. Not 'which transport is faster?', but 'which transport give me better probability to reach destination in time?' It is not same.

Ahh, no, we need the former if the system is to integrate, as it must, into the passengers' decision making process about whether to take the car or player transport.

Do you have any thoughts on whether my posited optimisation of route-finding might work within acceptable limits of performance?

inkelyad · January 03, 2013, 06:38:27 PM

Quote from: jamespetts on January 03, 2013, 02:28:07 PM
Do you have any thoughts on whether my posited optimisation of route-finding might work within acceptable limits of performance?

It might. But copy-paste of inter-city routes seems excessive and really memory-hungry. Just pre-calculate distance beetwen cities and use it in A* heuristic.

And it seems you are trying re-invent something like this: http://davis.wpi.edu/dsrg/Old/UMICH/t-gis2000.pdf (City == cluster inside network graph).

jamespetts · January 03, 2013, 06:49:54 PM

That paper looks interesting, having read the abstract. Is there any way of knowing (without incurring the incalculably vast amount of time required actually to code both versions from the ground up), or at least, making a sensibly educated guess as to, how pre-calculating the distance between cities per se will work as an A* heuristic? In all cases when using cities as an abstraction, we have to bear in mind what is in Simutrans (especially in the later game when cities have grown considerably) the not at all uncommon case of an origin point in one city being nearer the destination point in another city than either is to its own city hall (we must remember that measuring the distance between "cities" involves measuring the distance between a specific point in a city and a like point in another city); is this even an admissible heuristic given the possibility for these conditions? Will improving the heuristic really help in any event when we are still trying to perform hundreds of thousands of A* searches; and how will an improved heuristic help with memory consumption when we in any event have to store all the separate routes in order to run actual private cars along them and track how much that each of them is being used?

As to the paper, have you looked at it in any detail? The abstract mentions a number of different techniques; do you think any of them suitable for what we need for this purpose?

Edit: Hmm - I am not sure that this paper addresses the same problem:

Quote from: The paper
Since the size of the link table is often larger than the capacity of the main memory
bffer of a given GIS system, the link table may need to be stored on a secondary storage device,
typically on disk. While state-of-the-art database engines may attempt to cache the link table into
main memory during path evaluation, this will generally not be feasible due to size constraints.
3
In this case, many tuples links in the link table may need to be retrieved over and over again
from secondary storage and placed into the main memory buffer for evaluation. Given that such
I O operations on most modern computers are typically several 100-fold more expensive than CPU
operations, the I O costs are the dominant factor of path computation costs.

The high processing costs are thus incurred by the recursive nature of the graph traversal
component of path query computation. Resolving embedded constraints may further increase I O
costs significantly. For example, in a related e ort 17, 18, 22, 23, 19, 20 , we found that processing
spatial constraints see Q3 path query is very I O intensive. Thus such constraint resolution
competes with the path finding component of the search process for computational resources such
as the buffer space. This further motivates our research presented in this paper on optimizing the
path computation process by reducing I O activities.

The issue being discussed here is that the complete set of nodes and links for the sort of system discussed in the paper cannot be stored in memory at once, hence requiring frequent disc accesses. It also assumes a model of nodes being intersections and links being the roads between them, which is different to the Simutrans tile based model, in which every road tile is a node, and the connexions to its neighbouring tiles are the links.

In Simutrans-Experimental, the issue is not that each individual query takes a long time (what is described as the "spacial query" at Q3 is not relevant here because the links themselves do not have any special characteristics relative to one another) due to IO access, but that there are vast numbers of queries that have to be processed in a short time. The issue is CPU-intensiveness rather than memory size or bandwidth.

inkelyad · January 03, 2013, 07:05:51 PM

Quote from: jamespetts on January 03, 2013, 06:49:54 PM
Is there any way of knowing (without incurring the incalculably vast amount of time required actually to code both versions from the ground up), or at least, making a sensibly educated guess as to, how pre-calculating the distance between cities per se will work as an A* heuristic?

In current implementation each road cell know about its distance to all cities halls?

Quote
not at all uncommon case of an origin point in one city being nearer the destination point in another city than either is to its own city hall

Yes, but 'distance to city hall' heuristic usually points A* to right direction.

Quote
As to the paper, have you looked at it in any detail? The abstract mentions a number of different techniques; do you think any of them suitable for what we need for this purpose?

It is hard work. Usually I use such kind of papers for 'read referenced literature first'.

Edit:

Quote
The issue being discussed here is that the complete set of nodes and links for the sort of system discussed in the paper cannot be stored in memory at once, hence requiring frequent disc accesses. It also assumes a model of nodes being intersections and links being the roads between them, which is different to the Simutrans tile based model, in which every road tile is a node, and the connexions to its neighbouring tiles are the links.

In Simutrans-Experimental, the issue is not that each individual query takes a long time (what is described as the "spacial query" at Q3 is not relevant here because the links themselves do not have any special characteristics relative to one another) due to IO access, but that there are vast numbers of queries that have to be processed in a short time. The issue is CPU-intensiveness rather than memory size or bandwidth.

Yes, but we can't store all paths 'cell A' -- 'cell B' into memory and must recalculate it.
So effect is same: path query is long operation.

jamespetts · January 03, 2013, 07:35:00 PM

Ahh - do you imagine calculating and storing the distance to each city hall from each road tile? With 500 cities, that would be a very large amount of calculation - are you sure that this would help?

Indeed, perhaps I am missing something, but does a distance to the destination city hall heuristic actually add anything to the distance to the actual destination heuristic? Is it any better? Or is the point that the calculating of the distance to X from each tile is the part that takes most of the computational effort such that pre-calculating this once for every road tile would reduce the work of doing 300,000 A* searches enough to bring it within reasonable performance limits? If that is so, then would my system not do the same?

inkelyad · January 03, 2013, 07:54:15 PM

Quote from: jamespetts on January 03, 2013, 07:35:00 PM
Ahh - do you imagine calculating and storing the distance to each city hall from each road tile? With 500 cities, that would be a very large amount of calculation - are you sure that this would help?

I can be wrong, but my memory tell me we already do this. It is one fill from Dijkstra's algorithm for each city hall.

Quote
Indeed, perhaps I am missing something, but does a distance to the destination city hall heuristic actually add anything to the distance to the actual destination heuristic? Is it any better? Or is the point that the calculating of the distance to X from each tile is the part that takes most of the computational effort such that pre-calculating this once for every road tile would reduce the work of doing 300,000 A* searches enough to bring it within reasonable performance limits?

Pre-calculated distances to halls will lead A* straight to destination city ( first to to source city boundary, then to destination city via shortest path). There will be no backtracking there.

Quote
If that is so, then would my system not do the same?

You need to find and copy stored inter-city path. It almost same as reconstruct it via A* with right heuristic.

jamespetts · January 03, 2013, 07:58:57 PM

Quote from: inkelyad on January 03, 2013, 07:54:15 PM
I can be wrong, but my memory tell me we already do this.

I really don't think that this is done in Simutrans - where are these data stored, do you think?

QuoteIt is one fill from Dijkstra's algorithm for each city hall.

Might this not make placing new roads unresponsive on occasions if this has to be done every time that road tiles are laid, for every single tile?

QuotePre-calculated distances to halls will lead A* straight to destination city ( first to to source city boundary, then to destination city via shortest path). There will be no backtracking there.You need to find and copy stored inter-city path. It almost same as reconstruct it via A* with right heuristic.

But, in this case, we have to keep in memory all the road to city hall distances on top of the complete set of routes, whereas, in the system that I proposed, we only need to keep the complete set of routes. Does your system not take more memory than mine...?

inkelyad · January 03, 2013, 08:42:41 PM

Quote from: jamespetts on January 03, 2013, 07:58:57 PM
I really don't think that this is done in Simutrans - where are these data stored, do you think?

It just my memory. I remember that I used said distance one time.

Quote
Might this not make placing new roads unresponsive on occasions if this has to be done every time that road tiles are laid, for every single tile?

If we do it for every single tile, yes. But do we really need to?

Quote
But, in this case, we have to keep in memory all the road to city hall distances on top of the complete set of routes, whereas, in the system that I proposed, we only need to keep the complete set of routes. Does your system not take more memory than mine...?

It depends.
'each tile would be checked to see whether it also contains a route to the destination building.'
Each tile will be storing routes. It is not clear what will take more memory.

jamespetts · January 03, 2013, 09:09:08 PM

Quote from: inkelyad on January 03, 2013, 08:42:41 PM
If we do it for every single tile, yes. But do we really need to?

What's the alternative?

Quote
It depends.
'each tile would be checked to see whether it also contains a route to the destination building.'
Each tile will be storing routes. It is not clear what will take more memory.

The tiles will store pointers to the routes together with an integer representing which step of the route that they are on - the routes need to be stored in any event. Indeed, it would be helpful for tiles to store pointers to the routes in any event in order to give statistics to users. That is, at worst, 96 bytes per tile on a 64-bit system and 64 bits per tile on a 32 bit system. This compares favourably with having to store pointers to and distances to each of 500 or so cities per road tile.

News:

Modelling private car traffic without breaking the CPU bank

chicken

jayem