Modelling private car traffic without breaking the CPU bank

kierongreen · March 26, 2012, 05:58:59 AM

That algorithm should update the directions whenever a faster route is built. With all directions across the map known it should then be quick to calculate route weightings when required.

inkelyad · March 26, 2012, 06:51:21 AM

Quote from: jamespetts on March 25, 2012, 11:13:35 PM
the actual routes are calculated between a specific road tile in one city and a specific road tile in another city (the road tile immediately outside the town hall).

Quote from: jamespetts on March 25, 2012, 11:13:35 PM
but not go through the town hall road tile in City C. In some cases, this difference might be trivial, but in other cases it might be very great - by orders of magnitude of journey time in the right circumstances.

Do we really need to be that precise?
We have all this troubles only for 'find private car traffic density', right?

isidoro · March 26, 2012, 12:55:10 PM

@James: yes. Nodes=clients. Clients would desync if you don't mark the same simulation time in which the information generated by the server may be used by the clients.

Vladki · March 26, 2012, 07:57:34 PM

jamespetts: Could you please enlighten me why is this calculation done in the first place? What are the results used for? Maybe then I can come up with something smart

jamespetts · March 26, 2012, 09:16:58 PM

Kieron - ahh, perhaps I had misunderstood exactly how what you were proposing was to work. Might I ask you to elaborate a little, in that case? I can't immediately visualise how things would fit together at present - apologies if I am being dim.

Isidoro - this could be solved fairly easily by treating results from the private car pathing thread in much the same way as the (even more undeterministic) player input, could it not, and simply using the standard method in simwerkz.cc to issue a queued command to clients to update their private car routing tables? The possible problem from this is that it might involve a lot more data transmission over the network - some calculations would be required as to how much that this would generate to see whether ordinary home ADSL connexions could cope.

Inkelyad and Vladki - the purpose of the system is to simulate as accurately as possible within the constraints of what can be achieved in Simutrans the competition between private cars and public (player) transport. The system reads from a configuration file the proportion of people in any given year with access to a private car, randomly assigns (based on the weighting factor given by the percentage of car ownership at the present time) packets of passengers generated to having or not having a private car, and, for those who do, compare the private car journey (if the destination is reachable by private car) with the public transport journey (if that is available) to determine which of the two options that passengers take. That decision is based on part on the relative journey times of the two methods. (The full method of selection can be found in stadt_t::step_passigere() in simcity.cc). If the passengers can use public transport, but either do not have access to a private car or have access but the journey time of the private car is outside their tolerance, then, provided that the journey time for the public transport is within their tolerance, they will always take the public transport. Conversely, if players have access to a private car and the private car journey time is within their tolerance and there is no available public transport route (or no available public transport route within their time tolerance), they will always take the private car. If there is no journey to the destination available at all, or all journeys exceed the passengers' journey time tolerance, they will not travel.

Each private car trip is logged in the origin and destination city (or, if travel is within a city, in that city), and these numbers go into making the "traffic" figure seen in the city and world graphs (world statistics can be accessed from the city list window). The city version of the statistics are used to determine congestion (see the algorithm in simcity.cc for more details: congestion is calculated based on the number of private car trips and the size of the city). Congestion has two effects in the game: (1) it slows the growth of towns (stopping growth completely if congestion is at or above 100); and (2) makes passengers travelling to, from or within congested cities more likely to use public transport instead of private cars in proportion to the congestion.

I hope that this clarifies matters! Thank you all for your very helpful input.

kierongreen · March 26, 2012, 10:28:49 PM

The idea is that each roadtile stores the direction of the neighbouring roadtile which is closer to a particular destination. By looping over this you can construct a route to the destination (but much quicker than search for a route). Think of it as storing a signpost on each tile with the direction to every city (but no distance).

isidoro · March 26, 2012, 10:45:58 PM

Quote from: jamespetts on March 26, 2012, 09:16:58 PM
[...]
Isidoro - this could be solved fairly easily by treating results from the private car pathing thread in much the same way as the (even more undeterministic) player input, could it not,
[...]

Sure. They are one and the same problem. So, if that is solved in the present code, I have no objection to your approach.

jamespetts · March 26, 2012, 10:53:47 PM

Quote from: kierongreen on March 26, 2012, 10:28:49 PM
The idea is that each roadtile stores the direction of the neighbouring roadtile which is closer to a particular destination. By looping over this you can construct a route to the destination (but much quicker than search for a route). Think of it as storing a signpost on each tile with the direction to every city (but no distance).

Hmm - how would this interact with updating directions whenever a faster route is built? In other words, I cannot imagine any circumstances in which a route would need to be rebuilt, but the existing directions data are still valid: the only circumstances that I can think of that would trigger a route rebuild are ones that would also invalidate the directions data - does that make sense? Have I missed something?

Edit: Incidentally, has any consideration been given to putting the sync_step() methods into a different thread to the step() methods? Was it decided that this was not possible because sync_step() and step() methods both need to alter the same variables (if this is so)?

inkelyad · March 27, 2012, 05:50:33 AM

Quote from: jamespetts on March 26, 2012, 09:16:58 PM
... compare the private car journey (if the destination is reachable by private car) with the public transport journey (if that is available) to determine which of the two options that passengers take.

So we don't need routes. Only distance (in minutes).

Once per month run flood from each city to find distance and reachability for each road tile. Average that to 8x8 tiles grid. Store and use in passenger decision.

sdog · March 27, 2012, 05:57:01 AM

assumptions:
- with time road infrastructure improves if there is not a major geographical barrier preventing it.
- player action reducing interconnectedness is limited

form groups of connected cities while checking

build first database:
- start with a random ungrouped city and assign it to the group n
- check ungrouped city if connected add to same group, repeat for all cities

During game only check
- if a random city in group n is connected to group m, if yes join groups
- if cities are connected to at least k>1 cities of their group, if not create new group (choose k very small)

this does not cover groups of connected cities breaking up again, but single cities becoming isolated

jamespetts · March 27, 2012, 06:34:08 AM

Briefly, and for the moment not responding to the above as I am short on time, I omitted one important aspect of the description of the system above, which is this: journey times are calculated by taking the journey time for the distance from the city hall in one city to the city hall in another city (the exact method of calculating the journey time can be found in simcity.cc, but it is based on the average speed of available cars and the maximum speed of each road tile through which the route passes), then using that combined with the straight line distance between one city hall and the next to get a journey time per straight line tile value, which is extrapolated to calculate the journey times of all other journeys between any given specific point in the origin city and any given specific point in the destination city, as, in Simutrans, journeys are always made from and to specific buildings. Attractions and industries are also included in this.

isidoro · March 27, 2012, 01:03:39 PM

Quote from: jamespetts on March 26, 2012, 10:53:47 PM
[...]
Edit: Incidentally, has any consideration been given to putting the sync_step() methods into a different thread to the step() methods? Was it decided that this was not possible because sync_step() and step() methods both need to alter the same variables (if this is so)?

Yes. If you are not careful enough you can get race conditions, deadlocks, etc.

kierongreen · March 27, 2012, 01:37:49 PM

sync_step and step cannot run at the same time, I'm playing around at the moment with multithreading in step, but it is quite tricky!

isidoro · March 27, 2012, 01:47:35 PM

Small OT: Before going multithreading, I would try to solve this issue that puzzles me a lot: what on earth is ST doing while in pause mode with a rather small map (320x256) to eat up 25% of my CPU time on a quite decent machine?

I would expect a consumption near 0% in a paused game, wouldn't I?

jamespetts · March 27, 2012, 05:03:29 PM

Kieron - may I ask: what multithreading architecture are you using? OpenMP or something else?

Isidoro - is this just in Experimental, or is this also in Standard?

isidoro · March 27, 2012, 06:11:06 PM

Sorry, OT then. In Linux, Standard, I observe that... What do you observe in Windows Experimental? What is the CPU consumption in pause mode?

Carl · March 27, 2012, 07:22:09 PM

(I too find that pausing in Experimental only halves the CPU load. Doesn't eliminate it entirely.)

kierongreen · March 27, 2012, 09:44:08 PM

pthread

jamespetts · March 28, 2012, 10:34:13 PM

Kieron - may I ask, was there a particular reason for using pthread?

Inkelyad (above - that I did not have time to answer before): I am not very familiar with the flood algorithms, and so am having trouble understanding how what you propose would work. Is this much different to finding a route from everywhere to everywhere (which would take more time, not less)? How would this produce a journey time figure (or, at least, a journey time per straight line tile figure) from a start point to an end point? Apologies if I am being dim...

Sdog,

if I understand you correctly, your system would produce a binary answer to the question "Is city A connected to city B", whereas what we really need is a journey time from any arbitrary point within city A to any arbitrary point within city B (which can be abstracted, as now, by using journey time per straight line tile figures for each city connexion). It is important that the extent to which people use cars between particular cities is affected by the quality of the road network between those cities.

inkelyad · March 29, 2012, 05:41:45 AM

Quote from: jamespetts on March 28, 2012, 10:34:13 PM
Inkelyad (above - that I did not have time to answer before): I am not very familiar with the flood algorithms

I have habit to to use 'Dijkstra's algorithm' and 'flood algorithm' as synonymes.

Quote
Is this much different to finding a route from everywhere to everywhere (which would take more time, not less)? How would this produce a journey time figure (or, at least, a journey time per straight line tile figure) from a start point to an end point? Apologies if I am being dim...

There is difference between "find distance from this point to everywhere" for each destination point (that is, running pathfinding algorithm for each destination point) and finding it in one big sweep.

Unfortunately, we are trying to solve problem on flow network. This is hard and cpu-hungry. You will find a lot of computer science if you just google "flow algorithms".

prissi · March 29, 2012, 07:25:23 AM

In those cases, a regional pathfinder may help. I.e. the pathfinder finds first a path between citys and then from those points in the city to the place which where actually in question. This may result on bad routing for vehicles (i.e. driving forward and backward the same way for a short time) but provides much faster answers. I did a regional pathfinder for ships once, but due to this problems it was never included. You find the remains in experimental-hierachical-patchfinder.diff in the patches subdirectory of the SVN.

kierongreen · March 29, 2012, 05:59:24 PM

QuoteKieron - may I ask, was there a particular reason for using pthread?

Because it works, seems quite simple and doesn't add any dependencies!

sdog · March 29, 2012, 06:05:31 PM

James, you only need to run the actual path finding for cities that are connected, and only do it once until this state changes. You can also do test calculations between two groups of cities from time to time, if the time changes fundamentally, you can re-calculate.

The idea here is to reduce the time you need to run this algorithm as much as possible to get enough data necessary for a good enough model of car ownership through the assumption of a well behaved game. (that is players trying to break it artificially)

chicken · March 30, 2012, 02:01:02 AM

pthreads isn't a library for parallelism, it's an API for creating and managing threads. You would have to implement the desired notion of parallelism using the low-level functionality of pthreads. GNU OpenMP, for example, may be implemented using pthreads on Linux. pthreads is extremely low level, whereas something like OpenMP provides abstractions that are relatively high level.

If you are going to work at the level of pthreads, there isn't a notion of "choice" involved here. POSIX threads are simply part of the operating system interface (hence, no dependencies). Sure, you could probably get by using Linux-specific functions like "__clone" but who'd want to?

I still think you'll get more mileage from caching paths and avoiding redundant work.

Vladki · March 30, 2012, 06:43:24 PM

I still think about the city distance calculations and private/public transport decision. If I understand it correctly the calculated distance is used only as a factor in deciding whether to use private car or public transport. In real world you also decide on approximate information from road map (google maps are here only last few years). So for initial guess it would be enough to use distance between city halls of neighbor cities, and ignore the fact that there may be a shortcut through the suburbs. I suppose that if a private car trip is chosen, its routing is done independently of the discussed calculation. But it would be worth to record the real trip time in some table to continually improve the data. So then you would have tho tables of trip times between cities based on real times. But like in real world you would make decision on sometimes obsolete data - roads get repaired, closed, new highways are built, bus lines change, and so on. So sometimes you decide badly, simutrans citizens would also make such mistakes. So, would this be feasible?

merry · April 03, 2012, 12:47:00 PM

HI Guys,

Been reading through the thread, and i'd like to suggest some algorithmic ideas which may - or may not - be of use. I'll include some from other posters because it's good stuff, and fits in. Yes, it's a condensation of much of the above discussion, but I hope it helps clarity.

1. Calculation of routes & timings for citycars is taking too much time.
In common with many posters, I'd expect brute-force routing to all places from all places will be very (prohibitively!?) intensive.
So the idea of maintaining connectivity charts seems highly sensible, and only making changes to the table when the route network changes.
What seems to be the most likely to work is to maintain in each city, or nodal point (if that works better), a table of directly connected destinations. For each of these (by cascade/tree link) the indirect links possible through these can be obtained. This will yield a list with all possible destinations (and distance/time) but would not require brute-force calculation of anything but direct links for time & distance.
The result would likely be a number of mini-tables which contain the same destination with differing distance/time values. The list could be re-sorted into a single table to indicate the best route from one city to another city. Although a sort takes time, it's worth it because it does not happen often, and can be consulted often. And maintaining mini-tables for each city exit adds value: only the mini-table is recalculated when a route changes, and then the mini-tables collated into the master table as a combination-and-sort (not calculate) operation. Other mini-tables remain unchanged and if the master table contains tags of the mini-table a route came out of, only those particular entries need be updated (althoguh whether that would save time is questionable!)

2. The high rate of change in city road nets is a problem, as is cross-city transit time.
I suggest that each link out of a city (ie a place where a road crosses the city limits) be considered the point where routing is calculated from / to, not the city hall. For each of these boundary points, there is a (probably small) table of directly and indirectly connected cities, and an 'add-on' distance to the local city hall, and maybe a table of cross-city distances to other intercity routes to allow better through-traffic data. Thus the inter-city times are affected only by inter-city route net changes, and intra-city changes only affect local data. If cross-city distances/times only change a bit, there is no need to update other connected cities with the transit time as it's inconsequential on a long journey.
One benefit of this is that we will know the shortest route from the city boundary to the next city, which will generally be on the closest side, and avoids the problem of the city hall being much further from or closer to the starting point than the route out of the city!

3. Only update those links that change
And cascade the data up the tree at a controlled rate, as a background task, ie in a part of the program that is not time critical and can be freely interrupted. After all, signage to other cities will not be changed instantly in the real world when a distant route changes.

As James points out, connected industries, attractions, etc are similarly treated as cities.
this does much increase the number of conneted nodes, and I hope that the 'tree' approach above would reduce the amount of calculation made for routing.

As i think was noted earlier, a citycar probably has to calculate it's own way out of the city. This is consistent with finding one's way to the main road out of town, or to the long-distance public transport stop/station. My one concern is that this will possibly add much time to routefinding, but i suppose the approximate time/distance can be taken from the city hall for decision-making purposes. Doesn't work so well for a starting point on the outskirts of a large city, but maybe we can live with that? Lots of people don't work out the distance very accurately before they start a journey! Perhaps this aspect of the problem deserves some more attention, but I would suggest it is a small part of the total result, and not be allowed to drag the complute time out very far.
Actually, here's a simple solution to the issue: calculate the distance to the city hall and to the exit route point (straight line/euclidian) and route to the nearer of the two from your start point (and do the same from the destiantoin city entry to the destination point). This might be a tolerable approximatoin for timing, if you use the mean city traffic speed.

Hope this helps, afraid I can't do the detail code stuff (never got my head round C let alone C++, let alone having time to do it) but this looks more like algorithmic problem-solving so far. Respect to those who can actually code the algorithms to make it work!
TTFN
Merry.

jamespetts · April 03, 2012, 11:27:15 PM

Merry,

thank you for your contribution - some interesting ideas. I don't have time presently to respond fully, but one or two brief questions, if I may: firstly, what exactly do you intend counting as a "direct" connexion - one that does not pass through another city at all (even peripherally), or one that does not pass through a certain point in another city? Secondly, there is much to be said for the idea of separating intra-city and inter-city routing, but this also has drawbacks, the most prominent of which is the frequency with which city borders change. Can you think of a sensible way to deal with the rapidity of the change in city boundaries in a growing city?

Finally, a more general point: currently, the route finding system is quite basic in its functionality, as it just enables educated guesses as to the journey time between two given points on the map based on an average journey time per tile value. What I'd ideally like to do at some point (and have wanted to do for some time) is to map actual routes for private cars, and have the private car objects in the game actually follow these routes, and calculate congestion, not at present using a somewhat approximate model based on counting the number of cars in a city and considering the size and density of the city, but actually measuring how long that any given road tile has stationary traffic on it in any given month, and calculating congestion by individual road tile, as is done in Sim City, then using that congestion data as a cost in routing calculations for both private cars and player vehicles. Those congestion data could also be used to calculate a city's overall congestion level to use when determining a city's growth, as well as being used to determine the particular likelihood of passengers using their car for the particular trip, given the overall congestion level of the best route.

Obviously, it would not be possible to calculate entire routes from every building on the map to every other building on the map (as already demonstrated, this is not even possible for whole cities), but a system that calculated routes within cities separately to calculating routes between cities, and then glued together three routes for every inter-city trip (from the origin point within the origin city to the boundary, from the boundary of the origin city to the boundary of the destination city, and from the boundary of the destination city to the destination point) might be feasible.

The difficulty is that this would be a very, very large amount of work to implement overall. (I also worry a little about memory usage). Ideally, what would be good is a system that I could implement now to do the same work as the current system does, and to which I could add later a system that deals with individual routes. Suggestions that go to that aim would be particularly welcomed.

merry · April 04, 2012, 11:34:24 AM

HI all,

Well, let's address the points in turn:
What's a direct connection?
One which starts at a city boundary, and ends at another city boundary.
Any route which passes through a city is 'indirect' in that the route has more hops.
An indirect route is made up of multiple direct routes plus cross-city links. The links are approximated as a euclidian distance (at least initially) and stored in each port's data.
Each city maintains a list of the 'ports' (i.e. exit tile) by which it links to other cities, and a list of the destinations reachable, with the port to use. Rather like an ethernet switch I suppose! The beauty of this is that the city route table can refer to the ports by some kind of pointer - meaning that if a port location moves, the central table does not need update unless distances change.

What about city growth?
Well, if a city expands, then the ports move - but only a few tiles outwards (usually). So a minor recalculation would be in order, probably by deducting the change from routes through that port. A full route recalculation would only be needed if the new city boundary subsumed a divergence of routes. Interestingly, (and this is quite helpful i think) that would not affect the total distance from city to city, as the distance deducted from the direct link is added to the intra-city link, so the routing table stays constant, still pointing to the port, and the port data is updated to note the new location and distances. Some computation is needed, but not much I think. Occasionally, there will be a total distance change of over 1 tile and that should trigger a 'route changed' mechanism but this won't be often.

How about the rapid change of intra-city routes?
This method is probably best initially implemented by calculating a rough distance from city hall to each boundary port, as a Manhattan distance (thinking about it that seems most likely to be accurate given city roads). There is no need to calculate exact intra-city routes.
Later, the same method could be used to calculate distance from origin to the relevant boundary port. More time, but still quick enough.
For accuracy, perhaps the best method is for a citycar to report the distance travelled from origin to port when it goes that way, and find some algorithm to aggregate such data by city area? Not sure how to handle the really detailed solution per vehicle - but if you chose to do this calculation exactly, at least it would only be from origin to boundary port and a repeat at the destination city. To be honest, for decision making, I do not think it ever necessary to go to that level of detail; it seems just too much.

How to calculate congestion?
Can you record the number of vehicles (city or player) passing over a tile per month? That gives traffic rate data, and is an extra task only when entering a tile so not every step. For the true congestion , recording the 'stopped time' for a vehicle and adding this to a counter on tile exit seems a limited-overhead solution, although both of these could be an awful lot of extra data to store.
If a city has lots of congested or busy tiles, then it would be 'felt' congested/busy by the passengers and they can be expected to use private cars less.

What about the added complexity?
This is intended to simplify the routing method. Initially, the system can build the city connection tree iteratively.
let's see, the build method might be :
For each boundary port, identify the directly connected cities & other destinations (call them all 'cities' for now). Record the port to use, distance and/or time.
Do this for all cities.
For each connected city, identify the other connected cities reachable via that one. Add to the table and include cross-city links at that stage.
Ignore cities already linked unless the distance is less.
Repeat until destination tree limits reached, i.e. no new destinations found.
Sort the table by destination city and order results by distance/time to that city.

This should be easier than the brute force method used now. It is also more easily and quickly updated.
When a route changes, update only the affected city ports and resort the routing table if the sort order becomes wrong. Cascade the changes upstream if the table changes order, until the table ceases to change in a city.
Do all these updates in non-time-critical routines; it does not matter if routing tables are only updated a little behind reality.
Compared to calculating every route directly, this has to be less computational effort.

Hope this helps clarify a possible approach (I bet you will find some other issues when building the algorithm in more detail, let's hope they are not prohibitive).
I recognise that the algorithm implies a number of data structures that could use significant memory, but look-up tables that might be the price to pay for lower per-route computation.

Merry

prissi · April 04, 2012, 02:59:03 PM

Each way already counts the number of vehicles passed this months, and the total number of the previous month. That is there already.

kierongreen · April 04, 2012, 03:25:15 PM

Traffic passing per month is not quite the same as congestion level (two roads with differing speed limits may have the same number of buses passing per month, with the faster one being free flowing, the slower having a traffic jam).

inkelyad · April 04, 2012, 04:09:24 PM

Quote from: kierongreen on April 04, 2012, 03:25:15 PM
Traffic passing per month is not quite the same as congestion level (two roads with differing speed limits may have the same number of buses passing per month, with the faster one being free flowing, the slower having a traffic jam).

speed limit <-> maximum throughput (in vehicles per month). It is not straightforward, but should be easy.
congestion level = Traffic per month/maximum throughput.

And I more and more want to hit the books. Something like "Network Flows: Theory, Algorithms, and Applications" (well, Russian equivalent of it).

merry · April 04, 2012, 04:59:14 PM

Aha! not true.
One vehicle per month can be extreme congestion on a 100kph road, if it hasn't moved.
Congestion is surely how long the vehicle spends in the tile, vs the expected time. Ratio of 1=free-flowing.

Hmm. I have been thinking about another aspect of this issue. I wonder by how much the exact link distance varies from the manhattan distance between city boundary ports. or between city halls. Given the road-cuilder may try to build shortest route, wil it not often end up quite close to manhattan? (OK, there are sometimes diagonal roads, but usually built by players).
So a good approximatoin to inter-city link distance, where a direct route exists, may indeed be the manhattan distance. It's a starting measurement at the very least, if we want to work out travel decisions quickly (and it might then be computatoinally cheaper to just calcualte the distance every time rahter than have the complexity of lookup tables). I know this goes against the grain for Experimental (James & others do like accuracy, quite rightly most of the time), but I do wonder how much is to be gained by extreme precision in this particular decision?

Merry.

isidoro · April 04, 2012, 07:07:12 PM

I don't think that the important final issue is the measure of an abstract congestion, but real average speed on a tile: a quite congested highway may be better than a solitary dirt road...

Maybe if you account for time spent in that tile of all vehicles in a month and does: n_vehicles/time_spent you could have a good measure of the above, but now I realize that some vehicles simply don't go at top speed allowed by the road because their top speed is lower: for instance, an ****-driven cart.

For practical purposes, manhattan distance would suffice, imho. Specially if you are interested in travel time. It depends much more on congestion.

sdog · April 04, 2012, 08:57:13 PM

i think the reason james uses some route finding, is to check also the road type. the permitted speed varies greatly in different paks.

merry · April 04, 2012, 10:16:47 PM

Isidoro & sdog. I suspect you are both right.
The speed limit and congestion (or mean travel time) being per-tile is a serious point: the routine has to iterate over all tiles on the route to find an overall value, even for the speed limit (and hence ideal journey time). Would there be a need to (somehow) log the speed limit changes vs distance on a particular route to make a calculation valid for a particular speed limit vehicle? If so, could the most complex necessary method be a serial representation of distance and speed limit sections as a serial string of values that can be stepped through, in the style of d10s50d30s90 where d=distance, s=speed (but something simpler might suffice)?

The above is only true out-of-city - all city roads are whatever the pak specifies unless non-standard roads have been built. We might safely ignore the latter for calculations of time/distance from origin to boundary!
City transit data for boundary-boundary route segments might benefit from real routes being found, as those could be stored in a lookup with benefit (fewer combinations to consider).

Let's hope this is all leading to a workable (and codeable) solution.
Merry

News:

Modelling private car traffic without breaking the CPU bank

chicken