The International Simutrans Forum

 

Author Topic: New feature - performance issue  (Read 6076 times)

0 Members and 1 Guest are viewing this topic.

Offline jamespetts

  • Simutrans-Extended project coordinator
  • Administrator
  • *
  • Posts: 20649
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
New feature - performance issue
« on: April 27, 2010, 12:09:44 AM »
I have recently been working on a new feature for Simutrans-Experimental 8.0 that would track which city is connected to which other city, attraction and industry by road to increase the accuracy of the feature that simulates competition with private cars. I have got it working, and tested it somewhat: the code for it is currently on the Github -devel branch.

However, testing with the saved game that SDog uploaded here, I have found that it adversely impacts on performance. Certainly, on my debug build with my Pentium 4 3Ghz with 2Gb of RAM running Windows XP, I found repeated non-responsiveness to commands for seconds on end. Disabling the new feature by bypassing it in the code restored responsiveness. I have spent the evening trying to improve the performance, but with little in the way of results.

I should be interested in people's views on the matter: should it be made optional, and a reversion to a system in which it is assumed that everything is connected to everything else encoded so that people can choose which to use in simuconf.tab? Do those who have the ability to look into the code have any ideas as to how to improve the performance? I remember that there were similar performance problems when the new route finding code for passengers/goods was introduced, and Knightly did a great deal of work to improve performance there. In theory, this system should require a lot less computational power than the routing for passengers/goods, so it is not entirely clear to me at present why it should have such an effect on performance.

Any thoughts on the matter greatly received!

Offline neroden

  • Devotees (Inactive)
  • *
  • Posts: 831
  • Nathanael Nerode
Re: New feature - performance issue
« Reply #1 on: April 27, 2010, 12:32:16 AM »
Any thoughts on the matter greatly received!

We can probably figure out how to make it faster.  *Something* is being executed gazillions of times which should only be done occasionally.

In keeping with "true git development style", I advise removing the feature from the main -devel branch and spinning off a branch devoted solely to developing the feature.

Without looking at the code, my first guess is that the connections information is being updated too often.  (At most it should be updated when a new road is constructed -- possibly only once a month.)  My second guess is that the data is being stored in an way which makes it slow to access (since it has to be accessed very, very often).  My third guess is that there's a bug and the code is just spinning doing nothing somewhere you haven't spotted.

Can you pin the slowdown down to a specific bit of code change?  Which commit was it?  Was it this one:

Code: [Select]
commit 9f9907dc7b9136d6ca14d4772f8b6e7f88d2c236
Author: James E. Petts <jamespetts@yahoo.com>
Date:   Mon Apr 26 11:13:40 2010 +0100

    FIX: Correct mathematics of the calculation of private car average speeds.
    CHANGE: Use recorded car journey time to compare against players' transport's travelling times rather than speed bonus rating for road transport when passengers decide whetehr to take their car or players' transport.
    CHANGE: Average speed of private cars in any given month is now weighted by their chance factor.

Or was it this one?
Code: [Select]
commit fcdf6e25ac765915bcf993f6c5717858adb03a9b
Author: James E. Petts <jamespetts@yahoo.com>
Date:   Mon Apr 26 12:59:29 2010 +0100

    FIX: Industry and attraction road connexions now work properly.


Or was it something else?

EDIT: Oho, it was this one, wasn't it?
Code: [Select]
commit 3f455b2f4278f443d75f0b22e78374a0ccddd2b5
Author: James E. Petts <jamespetts@yahoo.com>
Date:   Mon Apr 26 02:29:35 2010 +0100

    CHANGE: passenger_max_wait is now a uint32 value, and the default is increased to 19440
    ADD: Private car trips can only be made where there is a road connexion between the origin and the destination. The approximate average speed of that connexion is measured and used when determining whether a private car trip is made.
    CHANGE: If "quick_city_growth" is enabled, the renovation percentage is automatically reduced by a factor of 3.
    CHANGE: Refunds are based on 1.5x not 2x of the straight line journey distance.

That's a little hard to revert since it's got all those other changes mixed up in it.  (I was taught the "One conceptual change per commit" rule when working on GCC.)

Apart from that, nothing beats actually profiling the code and finding out which subroutines are having the most time spent in them, and which are getting called the most often.  Do you have a profiler under Windows? 

I will also say that I haven't seen a slowdown yet under Linux, and I'm running HEAD of jp-devel, so the problem may be optimized away by GCC.  Unfortunately, I'm also getting a repeatable crash, which I will report in a different message....
« Last Edit: April 27, 2010, 01:03:05 AM by neroden »

Offline jamespetts

  • Simutrans-Extended project coordinator
  • Administrator
  • *
  • Posts: 20649
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: New feature - performance issue
« Reply #2 on: April 27, 2010, 12:59:45 AM »
Neroden,

speeding it up would be the best solution! But I suspect that it's a bit too late to spin this off as a new branch now, as most of the work on it is done. I have already tried to reduce drastically the number of times that connexions are re-calculated: starting at once a month, then moving it to a staggered system (based on the first letter in a city's name) once a year, then detecting whenever a road, road bridge, road tunnel or road sign is built or deleted *then* recalculating in a new month staggered accross a year after that occurs. I did not test with a big map until after the second commit, so I cannot say whether it worked faster initially: I was just testing basic functionality on a small map with two cities to start with.

What I found was that, even with no re calculation, just the initial calculation of connexions, there were still many hundreds of calls to the routine that calculates new connexions, and this is spread over a great many months (using SDog's saved game in 1934, with quite a low private car percentage, which might explain why it is spread so much, I suppose). However, even after four or five game months, I recorded only 412 calls to the city to city route finder, 64 calls to the intra-city speed approximator (when the origin and destination are in the same city), and 1029 to the hashtable in which the data as to which cities are connected to which are stored. I had previously tested disabling the industry and attraction route finders, and, even with just the city route finder enabled, performance was just as slow.

Incidentally, the pointer hashtables are the same as are used, successfully, for the goods routing system, so I don't think that the data structures are themselves the problem.

I am currently working on a further optimisation that provides a short-cut in cases where a non-connexion is registered: if city X is not connected to city Y, then it follows that every city that is connected to city X is also not connected to city Y, so that datum is propagated accordingly. I shall have to see whether that has a positive impact on speed.

If you have any other ideas, they would be gratefully received!

Offline neroden

  • Devotees (Inactive)
  • *
  • Posts: 831
  • Nathanael Nerode
Re: New feature - performance issue
« Reply #3 on: April 27, 2010, 01:19:57 AM »
I suspect that it's a bit too late to spin this off as a new branch now, as most of the work on it is done.

Oh, this can be done. :)

First, save the commit number (Call it "#1") of the last commit containing this feature's code.
Then, rip it all out of HEAD and commit again.  Save *that* commit number.  (Call it "#2")
Don't do any other, unrelated commits in between the two.
At that point, I can do something like this (and so could you under Linux):
Code: [Select]
git diff #2 #1 > privatecar-connexions.diff
git branch privatecar-connexions-branch HEAD
git checkout privatecar-connexions-branch
patch -p1 < privatecar-connexions.diff
git add .
git commit

Voila, a "privatecar-connexions" branch containing the changes for this feature.

EDIT: Please do note that we had a "response collision" and I edited my message before this one while you were responding to it.

Offline jamespetts

  • Simutrans-Extended project coordinator
  • Administrator
  • *
  • Posts: 20649
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: New feature - performance issue
« Reply #4 on: April 27, 2010, 01:20:43 AM »
Hmm - can this be done in GIT GUI for Windows...?

Offline neroden

  • Devotees (Inactive)
  • *
  • Posts: 831
  • Nathanael Nerode
Re: New feature - performance issue
« Reply #5 on: April 27, 2010, 01:27:10 AM »
Hmm - can this be done in GIT GUI for Windows...?

Well: it relies on using 'patch', which is not part of git and you certainly don't have in a GUI (though you may have a copy of it somewhere else).  And the "git diff" command to generate a diff file, which I don't know where to find it in a GUI.

I'm sure you can keep track of the correct commit numbers even in a GUI, though -- I suppose you could always tell 'em to me and I could generate the branch ;D

I'm really a CLUI person....

Offline jamespetts

  • Simutrans-Extended project coordinator
  • Administrator
  • *
  • Posts: 20649
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: New feature - performance issue
« Reply #6 on: April 27, 2010, 09:11:28 AM »
I've just realised that I didn't respond to your edit in the message one or two posts above: are you using SDog's saved game? The thing is that the number of calls to route_t::find_route increases exponentially with the increase in the number of cities (although recent optimisations have attempted to damp that a little by making justified assumptions about what is and is not connected in some cases). If you are using a saved game with fewer cities and/or passengers, then you may well not notice problems that are very apparent in a larger game.

If you are using SDog's saved game - what are your system specifications? And are you compiling with optimisations turned on? My debug build keeps them off, which does affect performance, but performance was acceptable in a debug build before I added this feature.

Offline Dwachs

  • DevTeam, Coder/patcher
  • Administrator
  • *
  • Posts: 4821
  • Languages: EN, DE, AT
Re: New feature - performance issue
« Reply #7 on: April 27, 2010, 10:24:37 AM »
some random thoughts on this issue:

-- you can specify a max length for calc_route: then if no route exists the search will not go over all connected road tiles

-- maybe you should stop the route calculation as soon as another city is reached? That said, if the route from A to B goes over C, you dont have to calculate A-B,B-C,A-C but only A-C and C-B and add the travel time.
Ie fahrer_t::ist_befahrbar(grund_t *gr) returns false if gr belongs to the wrong city?

-- before route calculation check whether such a passenger will take a car at all for this connection (connection too long, available roads too slow etc), then you can exclude some long connections from the search

good luck :)

Offline jamespetts

  • Simutrans-Extended project coordinator
  • Administrator
  • *
  • Posts: 20649
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: New feature - performance issue
« Reply #8 on: April 27, 2010, 10:52:45 AM »
Dwachs,

thank you for your thoughts. As to the max_depth setting, I'm not sure how I could set this to a value less than the maximum without the possibility of producing false results (cities only connectible by tortuous routes being registered as not connectible at all). I cannot use tolerance, since the results are not calculated individually for each passenger packet, but are stored in a hash-table. The best that I could do is to use the maximum possible journey time tolerance, which, by default, is set very high, and assume the highest possible speed of road. That, I suspect, would not limit things much. I shall give that a go, however.

Stopping when the route search reaches another city would not suffice, since the route through the next city that it reaches would not necessarily be the shortest route: what is calculated is not just whether a town is connected to another, but also the speed of the connexion, so that players have stiffer competition if two towns are connected by a fast, direct motorway than if they are connected by a winding dirt road.

Offline prissi

  • Developer
  • Administrator
  • *
  • Posts: 10440
  • Languages: De,EN,JP
Re: New feature - performance issue
« Reply #9 on: April 27, 2010, 01:07:58 PM »
Dwachs concept is correct, it is called hierachical pathfinder, first search by twons then search connection. Otherwise you can specify a maximum distance your are going into the wrong direction.

Offline jamespetts

  • Simutrans-Extended project coordinator
  • Administrator
  • *
  • Posts: 20649
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: New feature - performance issue
« Reply #10 on: April 27, 2010, 01:25:00 PM »
Prissi,

perhaps I am not understanding something: doesn't found_route use A*? If the searching algorithm only looks for connexions through other towns, it might miss a direct route that does not go via another town; and searching connected towns could equally end up starting by going in the wrong direction, could it not...?

Offline Dwachs

  • DevTeam, Coder/patcher
  • Administrator
  • *
  • Posts: 4821
  • Languages: EN, DE, AT
Re: New feature - performance issue
« Reply #11 on: April 27, 2010, 01:40:04 PM »
I meant something like that: first search all direct connections, then search with these results all indirect connections.

My idea was: usually cities are connected in a point-to-point fashion. Each city itself is directly connected to a couple of cities around it. Connections to cities further away will certainly go through a neighbor city.

A rough real word example: if one want to travel by car from London to Glasgow one has to go via Manchester. Searching three routes: London-Glasgow, London-Manchester, Manchseter-Glasgow is inefficient. It would be more efficient to only search the 'direct' routes London-Manchester and Manchester-Glasgow, and then deduce that the trip from London to Glasgow is possible and goes over Manchester. The travel times then can be added.

(I just looked at this map: http://www.multimap.com/world/GB/England  :)

The direct connections can be searched with a breadth-first search ( http://en.wikipedia.org/wiki/Breadth-first_search ). Road nodes in the search are only expanded if they do not belong to a city. In that way you will find all direct connections from one city to all others. (A^* would be not suitable since there is no specific goal).

After that, you obtain a graph containing all direct city-to-city connections. To get all possible connections, you have to do a breadth-first search in that graph too.



Offline neroden

  • Devotees (Inactive)
  • *
  • Posts: 831
  • Nathanael Nerode
Re: New feature - performance issue
« Reply #12 on: April 27, 2010, 07:28:16 PM »
I've just realised that I didn't respond to your edit in the message one or two posts above: are you using SDog's saved game?
No.  (Edit: I can't find it from your link, which doesn't work.)
Quote
The thing is that the number of calls to route_t::find_route increases exponentially with the increase in the number of cities
Right.  How many cities was SDog using?

The game never automatically creates new cities (!), the old cities just get bigger, so I rarely have many more than 16 cities.  Rather more Attractions and Factories than that, of course.  2^32 is still a pretty big number, but still....

Stopping when the route search reaches another city would not suffice, since the route through the next city that it reaches would not necessarily be the shortest route: what is calculated is not just whether a town is connected to another, but also the speed of the connexion, so that players have stiffer competition if two towns are connected by a fast, direct motorway than if they are connected by a winding dirt road.
In a pinch, it would certainly be acceptable to drop the speed measurement -- assume that all SimCitizens love long, long drives.  It would still be an improvement over not modelling disconnection at all.

The "two-layer" data structure suggested by others -- a network of "direct connections" and a network of "indirect connections" -- would also be my first try for efficient data handling.  The actual pathfinder only generates the "direct connections" for each node, stopping as soon as it hits another city (or out-of-city attraction or factory).  These links form a graph and the other connections are computed from the graph.  It will give a few false positives if the player has carved a city into two disconnected road networks or connected an attraction or factory with two disconnected road networks, but this is certainly an acceptable compromise.

Edit: I notice that you have code for the "assume_everywhere_connected_by_road" option  in the current version.  Is this in fact satisfactory to restore performance?

Edit: I have a weirder thought.  The problem is related to the number of *routes*.  How many lines can be handled normally without a slowdown?  Basically, we're adding lines here (private car lines) -- adding an intercity bus route between every pair of cities should give the same result (if it doesn't, reimplement the entire thing in the form of intercity bus routes run by the public service player).  My question is how many "automatic" routes we can afford to add: I'd guess about 10% of the normal maximum is what players will tolerate.  We can then design based on knowing that.  (But I'm not sure what the normal maximum actually is.)
« Last Edit: April 27, 2010, 08:45:26 PM by neroden »

Offline jamespetts

  • Simutrans-Extended project coordinator
  • Administrator
  • *
  • Posts: 20649
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: New feature - performance issue
« Reply #13 on: April 27, 2010, 10:17:36 PM »
Thank you all for your careful thoughts and consideration of this issue. I shall take the points out of order, as it will assist the clarity of the response. First of all, sixteen cities is a very small number. Simutrans-Experimental combined with Pak128.Britain-Ex is intended to be able to simulate an area roughly the size of England to scale (at 250m/tile), with a realistic number of conurbations for an area of that size. Using a small map means that one only gets to use low-speed, local transport. Take the default size of 256x256 tiles (a default set, incidentally, back in the late 1990s, when computers were far less powerful than now). 256 x 250m = 64,000m or 64km. In other words, each side of the map is less than a third of the distance between London and Bristol. This is not far enough for express trains, let alone aircraft, to be worthwhile.

A good map size for a compelling game is something like 1280 x 512 or even 2048 x 768. To get a sensible density of towns and cities in a good variety of scale sizes in a map of that size, one needs about 100 - 250, with a median size of 2,500 - 6,000 each. A large size will be especially important when network multiplayer games become possible, as more players will need more space between them to build up a decent sized network each.

Sdog's saved game, which is here is a modestly sized map, but considerably bigger than the default: something like 512 x 768, I think. It has something like (at a rough count) 64 cities. 64^2 = 4096.

However, given the way in which the current system works, it is not quite an exponential growth. It might be sensible if I give an overview of how the system works, which, in its present state, is designed to maximise accuracy. Every time that the passenger generation routine is checking whether a private car trip is viable, it first checks to see whether the destination city is stored in the hashtable. If it is stored in the hashtable, it will retrieve the value, which is the average speed per straight line tile. This is then multiplied by the straight line distance between the actual origin and the actual destination of the packet of passengers to produce the travelling time. This travelling time is then compared with the travelling time for the players' transport, and the likelihood of passengers taking their own car as opposed to the players' transport is adjusted in proportion to the relative difference in travelling time between the two. Even when there is no such comparison, the journey time tolerance feature is used to check that the journey time is within the passengers' tolerances; if it is not, they will not travel. This is important, because passengers transported by car contribute to city growth just the same as passengers transported by any other means.

The travelling distance per straight line tile is calculated thus: a route is found between the road outside the townhall in the origin city and the road outside the townhall in the destination city. It is not enough just to measure from the boundary, since this ignores the part of the journey in the city itself, which is likely to be significant in the case of cities close together. The the minimum of the maximum speed of each road tile in the route and the average maximum private car speed (weighted by those cars' "chance" ratings) is summed then averaged for the straight line distance between origin and destination, before being divided by a further factor to represent the fact that vehicles will inevitably travel at an average speed of somewhat less than the maximum speed. The average journey time per straight line tile is then calculated, which can then simply be multiplied by the straight line distance between origin and destination to get a realistic travelling time based on the directness of the route without having to measure the actual length of the route every time. If no route is found, 65535 (the highest number that can be stored in an unsigned 16-bit variable) is stored in the hashtable. If this value is detected, passengers will not register a private car trip at all.

Every time that a route is found between two cities (A and B, say), the journey time per straight line tile is added to both cities' hashtables, so that the route does not need to be calculated backwards. Furthermore, it can be assumed that, if A is not connected to C, then, if B is connected to A, it, too, is not connected to B. So, whenever it is found that a city is not connected to any city, it iterates through its list of cities that are connected to it in the hashtable, and puts the unconnected city as the key 65535 as the value in each of those hashtables. Further, industries already keep track of whether they are inside a city. When checking a route to an industry, the system first checks whether that industry is in a city. If so, the connexion data for that city is returned for the industry (treating the industry, in effect, as just another building in the city). These measures mean that the growth in the number of routes that it is necessary to calculate is less than exponential with the number of cities in the game.

If a private car trip is sought with an origin and destination in the same city, the trip is assumed to be possible: a route is not calculated, but the townhall road is used and assumed to have the same speed limit as the rest of the journey, which is then calculated in the normal way, save that an extra factor is added to approximate the difference between the straight line distance and the route distance.

I have thought at some length about using existing connexion data, as Dwachs suggests, but I can't see a way to make it work with the current system. Suppose we are at city A looking for a route to city C. We know that city A is connected to city B, and that the journey time per straight line tile is 5 tenths of a minute. How would that assist in knowing how many tenths of a minute per straight line tile it takes to get from city A to city C? For all the finding system knows, city B might be further away from city A than city C. We don't know the distance between cities A and B or B and C (or, before checking the route, avoiding which is the point of this exercise in the first place, A and C), and so cannot even calculate a proportion of the journey that should be assigned to the part between city B and city C at the known number of tenths of minute per straight line tile even if we do assume that the A is only connected to C via B, which may well not be the case: not only is it possible for there simply to be two roads (and not necessarily built on map generation: one of the roads may have been built by a player later, or the public service player), but the journey between B and C might well pass through city A, and just happen to have been checked first. If there is a circuitous route between cities A and B, but a straight route between cities A and C, the B to C results will be positively misleading.

If the suggestion is to store something other than the journey time per straight line tile, then the problem with that is this: the method accessing the hashtable is called far more often than the route finding code, so speed there is even more important than in route finding. If the data are stored in as close as possible form to that in which it will need to be used, this will maximise the efficiency of the system at the point when it is used more heavily. There is no point in making it work better for the insertion end if that just makes the even more performance important extraction end worse.

I had also given some thought to the suggestions about making the feature a binary-only feature: in other words, not checking the journey time per tile at all, just checking whether a city is connected to another city, industry or attraction or not. However, this is of very limited use. By the time in the game when the proportion of people with access to a private car is high enough, most places should be connected to each other by road (or, at least, in real life would have been). The real issue in a great many cases will be whether the players' public transport connexions are better or worse than the road connexions, taking into account congestion. This is designed to give rise to particular gameplay considerations. For example, in a network game, suppose that the public service player is being directly played by a human: its goal is, unlike with other transport companies, whose goals are to maximise their profits, to maximise the economic growth of the region, measured by the population growth. The public service player will be faced with decisions such as: should a new, fast road be put in place to connect major towns together, which would allow a greater proportion of the people in those towns to travel than currently do, thus potentially leading to economic growth, or are the benefits of such a road outweighed by the increased congestion (which reduces population growth) or the fact that such a measure might undermine the viability of public transport connexions such that they would either have to be subsidised (meaning less public money available for other economy-boosting activities: installing more power lines, perhaps), or close down entirely (preventing those without access to a car from using them, or increasing congestion further)?

When the "assume_everywhere_connected_by_road" option is enabled, then city-to-city (and to industry and to attraction) time calculations are made in the same way as intra-city time calculations in the above method: it is assumed that every road tile will be of the current inter-city road type, and a factor added to approximate the fact that the road will likely not be straight.

Offline Dwachs

  • DevTeam, Coder/patcher
  • Administrator
  • *
  • Posts: 4821
  • Languages: EN, DE, AT
Re: New feature - performance issue
« Reply #14 on: April 28, 2010, 05:23:55 AM »
Quote
I have thought at some length about using existing connexion data, as Dwachs suggests, but I can't see a way to make it work with the current system. Suppose we are at city A looking for a route to city C. We know that city A is connected to city B, and that the journey time per straight line tile is 5 tenths of a minute. How would that assist in knowing how many tenths of a minute per straight line tile it takes to get from city A to city C? For all the finding system knows, city B might be further away from city A than city C. We don't know the distance between cities A and B or B and C (or, before checking the route, avoiding which is the point of this exercise in the first place, A and C), and so cannot even calculate a proportion of the journey that should be assigned to the part between city B and city C at the known number of tenths of minute per straight line tile even if we do assume that the A is only connected to C via B, which may well not be the case: not only is it possible for there simply to be two roads (and not necessarily built on map generation: one of the roads may have been built by a player later, or the public service player), but the journey between B and C might well pass through city A, and just happen to have been checked first. If there is a circuitous route between cities A and B, but a straight route between cities A and C, the B to C results will be positively misleading.

I think you misunderstood me. I meant, that before doing any passenger stuff, the pathfinding or connection finding has to happen.

To remain in the ABC-example: A-B and B-C are directly connected.
1) All direct connections are found: A-B, B-C. This is not A* as no goal is known. Just expanding the nodes. Nodes(road tiles) are not expanded if they are in a city, but they serve to compute the travelling distance to that city.
The connection A-C is not found here, since any pathfinding attempt stops in the region of city B (thus limiting the maximal number of explored road tiles)
2) Perform pathfinding in the direct connections graph: now A-C is found. The length of this connection (in tiles) is the length of A-B plus the length of B-C. Similar calculation can be done for speed, etc. The journey time per straight line (jtpsl) for A-C is then:

jtpsl_{A-C} = ( jtpsl_{A-B}*len_{A-B} + jtpsl_{B-C}*len_{B-C} ) / (len_{A-B} + len_{B-C} )

The goal of this idea is to reduce the number of pathfinding attempts between different cities.

Or did you mean, that the pathfinding starts if a passenger wants to travel?


Offline neroden

  • Devotees (Inactive)
  • *
  • Posts: 831
  • Nathanael Nerode
Re: New feature - performance issue
« Reply #15 on: April 28, 2010, 06:03:16 AM »
Thank you all for your careful thoughts and consideration of this issue. I shall take the points out of order, as it will assist the clarity of the response. First of all, sixteen cities is a very small number. Simutrans-Experimental combined with Pak128.Britain-Ex is intended to be able to simulate an area roughly the size of England to scale (at 250m/tile), with a realistic number of conurbations for an area of that size.
England is actually *very* dense -- but remember that we aren't simulating every single person, since each "house", containing 1 "person", is approximately 250 meters square.  This means any town smaller than about 700 people does not exist on this map scale; it wouldn't even warrant a single simhouse.

.  I routinely use large maps (1024x1024 minimum) with relatively few cities.  Which is quite entertaining actually -- you get to do *real* long distance runs like we have here in the US, through open countryside.

Quote
A good map size for a compelling game is something like 1280 x 512 or even 2048 x 768. To get a sensible density of towns and cities in a good variety of scale sizes in a map of that size, one needs about 100 - 250,
Good grief.  That's really very dense.
Quote
with a median size of 2,500 - 6,000 each.
And they'd all be overlapping.  Which I guess is basically true for southeast England.  But it's actually a little bit unusual, isn't it?  This seems overdense for, say, Northern Scotland.  :-)

That level of density in an early-time-period game actually makes long-distance travel a lot less useful, to my mind.  It also means that house demolition becomes essential both early and often.


(Tests)  A non-optimized build starts the game nonresponsive with 175 cities.  I can't even hit pause.  After a while it clears up (probably after the city connections table gets built?) and then it's perfectly responsive.  (Edit: apart from The Crash, still unsolved.)

The spacing isn't *quite* as bad as I expected, but I do get stuff like cities contained completely within other cities (which I guess actually happened in London, but still).  This leads to bizarre behavior with neighboring houses being officially in different cities.  At least this behavior will be bizarre until "city merger" code gets added to the game.  And this might be a partial solution to the problem -- there's no reason endless population growth should mean endless *numbers* of cities -- it's all right for London to absorb Westminster.  Unless cities are separated by countryside they might as well be the same city.  Unused townhalls can be converted into historical monuments.

A bigger conurbation from the map I generated (due to cities merging) covers about 625 sq. km. (scale).  This is *far* too large for a typical decent-sized city of 1800.

I'd start with a quarter that many cities in 1800 on a 2048x768 game, and smaller cities.  And I enjoy the challenge of building up enough profit to start constructing New Towns.  :-)

Quote
A large size will be especially important when network multiplayer games become possible, as more players will need more space between them to build up a decent sized network each.
Large size, or large number of cities?  I tend to prefer having some serious countryside.  Your suggested settings will give me a single megalopolis before 1950 (which I guess Kent kind of is, but still).

One of the things I like about experimental is that the more generous 4 tiles = 1 km scale allows for more countryside.  This method of play doesn't really use that.

Quote
Sdog's saved game, which is here is a modestly sized map, but considerably bigger than the default: something like 512 x 768, I think. It has something like (at a rough count) 64 cities. 64^2 = 4096.
Quote
I had also given some thought to the suggestions about making the feature a binary-only feature: in other words, not checking the journey time per tile at all, just checking whether a city is connected to another city, industry or attraction or not. However, this is of very limited use. By the time in the game when the proportion of people with access to a private car is high enough, most places should be connected to each other by road (or, at least, in real life would have been).
Not if you play New Mexico or Colorado, or even Orkney Isles, style.  :-)

In other words, I strongly disagree with your assessment that a good game needs to be Megacity One, or even BritCit. ;)

Quote
When the "assume_everywhere_connected_by_road" option is enabled, then city-to-city (and to industry and to attraction) time calculations are made in the same way as intra-city time calculations in the above method: it is assumed that every road tile will be of the current inter-city road type, and a factor added to approximate the fact that the road will likely not be straight.
Does this speed the game up enough to eliminate the slowdown (due to eliminating the hashtable lookups)?

I think the fundamental question is how many routes can be handled.  If we can't handle 4096 bus routes, we have no business making 64 cities, because I can easily generate that many transit routes in a typical game with that many cities, at least by the late 20th century.  And if we have no business making that many cities, then the problem goes away.

That said, there does seem to be a problem: the number of routes generated by this code seems to be something close to the typical number of routes for a major game-dominating company in the late game.  So 50% of routes for the player, 50% for the computer?  This cuts the number of supportable routes by half, which is much more than the 10% which I guessed would be acceptable.

Offline jamespetts

  • Simutrans-Extended project coordinator
  • Administrator
  • *
  • Posts: 20649
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: New feature - performance issue
« Reply #16 on: April 28, 2010, 09:57:42 AM »
Thank you again for your replies. Dwachs - I think that I did have trouble understanding you, and am still not entirely sure what you mean. However, you are correct that the pathfinding code is not called unless it is needed: when passengers want to travel, then if (and only if) they have access to a private car, the hashtable is checked to see whether it contains the destination. If it does not, only then is the actual pathfinding called. Whenever the need for a recalculation is detected, the hashtables are simply cleared, the recalculation itself not happening until just before it is needed. The idea of calling only on demand is to minimise the number of calls to the expensive route finding algorithm. Calculating everything in advance would undermine that, wouldn't it? And I'm still not sure how that system would handle a situation where some roads bypass some towns.

Neroden - I did indeed want to be able to simulate accurately the density of England, including South-East England for Pak128.Britain. There are some very interesting transport possibilities with such a high density, especially involving suburban rail. Simulating a lower density is much easier, of course, but anything that can simulate a lower density can also simulate a higher density.

I have not yet fully calibrated the ideal starting town size, nor the ideal growth rate (it looks as if it is probably a great deal too high at present). I certainly don't want the entire map becoming filled with one giant city, but there should be plenty of places where cities intersect and are even entirely contained within one another. One feature that I added some time ago (which can be disabled by setting "cities ignore height" on the map generation window) is to weight the otherwise random distribution of towns inversely with the height of the tiles, so that the map generator generates areas of high population density (at lower altitudes) and areas of low population density (at higher altitudes). The idea is to enable the creation of places like South-East England and places like North-East England on the same map without manual intervention. Ideally, there should be a mix of sprawling urban areas with little countryside between different towns, medium dense areas with about equal amounts of town and countryside by the mid/late 20th century, and rural areas with few towns and large swathes of untouched countryside (perhaps filled by enormous farms if anyone ever gets around to finding a way of having farms with vast numbers of fields that do not insulate themselves from the possibility of ever being connected to transport). Incidentally, I never said that a good game needs to be like South-East England, but simply that, if Simutrans-Experimental is to do what it is intended to do, then it needs to be able to simulate the density of South-East England if the player wants it to do so.

Your city merger idea is interesting, but sounds as if it would take a lot of work (the idea of converting a town hall into an attraction is cute, however). Is there actually a gameplay problem with one city being contained within another at present?

As to "assume_everywhere_is_connected_by_road", this does indeed entirely eliminate all slowdowns. It is interesting that you reported that, with 175 cities, the non-responsiveness ended after a while. This would indeed be because the tables become filled with data and thus cheap hashtable lookups replace expensive route finding. Note, however, that hashtables will progressively be cleared in the course of a year every time that you (or any other player, including the public service player) build or demolish a road, road bridge or road tunnel, or add or remove a road sign. It is particularly interesting that you report that the responsiveness is fully restored, however, since that has not been my experience with SDog's saved game, "EX-11". May I ask - what are your system specifications? And did you have the timeline on when you set your 175-city mega-map? The proportion of the population with private cars will affect the number of private car trips, which will, in turn, affect the speed with which the hashtables are filled. The default if no timeline is set is 25% private car ownership.

Edit: One other thought: supposing that the data were saved with a saved game: that would mean far less recalculation when a game was loaded, although it would take a fair bit of disc space (and also network capacity when transmitting a network game): I could store the co-ordinate (koord3d) of a city's townhall against the journey time per straight line tile. Any thoughts on whether that would be worthwhile?
« Last Edit: April 28, 2010, 11:31:54 AM by jamespetts »

Offline jamespetts

  • Simutrans-Extended project coordinator
  • Administrator
  • *
  • Posts: 20649
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Further thoughts
« Reply #17 on: April 28, 2010, 09:46:37 PM »
I ran a test this evening, creating a new map with 175 cities with the timeline on starting in 1980 (private car ownership: 59%). The game was unplayably unresponsive when it started. I left it running whilst I went to cook then eat dinner. When I came back, it was still unplayably unresponsive. (Edit: I have since tried the same with a fully optimised release build. It is still unplayable.) This feature seems impractical as currently implemented.

I was thinking this afternoon about what Dwachs had suggested, of re-using route fragments (which is not possible as currently implemented, as the routes are repeatedly overwritten). I also remembered playing Sim City 4, a game in which private car journeys are generated individually for each origin/destination pair of buildings in the entire city, and the exact route taken by those private cars precisely tracked and visible with little arrows when clicking on a building or road tile with a particular tool. Sim City 4 was released in 2003, and is able to do that far more precise set of calculations, in which far more information is preserved, with considerably less computational overhead, as witnessed that the game does run with acceptable performance on older computers. When I first started playing it, I had a Pentium III 600Mhz CPU and 256Mb of RAM (although it was noticeably smoother on my present system).

There must, therefore, be a more efficient way of producing a pathing engine than that which I currently use. We can't look at the source code of Sim City 4 (we can look at the source code of the original Sim City, although it has a less precise route finding algorithm called "trip generation" - the more precise version was introduced only in Sim City 4), but we can make educated guesses as to how it might work and how to implement something similar in Simutrans-Experimental.

One possibility is this: instead of discarding the routes when generated, keep them. Then, during every search, check every tile for a route passing through that tile that goes to the desired destination. If there is such a route, simply append that onto the end of the existing route and search no further. One could even search backwards along a route, and check every tile to see whether a route has at its origin the desired destination, although the route would have to flag whether any one way tiles were involved to prevent difficulties in that respect.

Using this method has a number of positive side-effects. Firstly, it would be fairly easy to track the exact number of private cars passing through any given tile in any given month. Each road type could be assigned a capacity, either based on its speed limit or set individually in the .dat file (automatically doubled if the road is one-way). Intersections could have a reduced capacity. If the capacity is exceeded, the road tile becomes congested: the more that it is exceeded, the more congested that it becomes. A tile's congestion could then be factored in to the cost of the route choice when the pathfinder is run again. Similarly, it could also increase the journey time (one possibility being for the system to re-step the route to re-calculate journey times much more often than it recalculates the routes themselves). This congestion could be displayed on a map. City car graphics could be set to tend to go more in the direction of road tiles that are heavily used than lightly used to give visual feedback about the heaviness of traffic on particular routes. I suspect that this is a rather similar system to that used by Sim City 4.

Secondly, it might be possible to have a system in which, just as in Sim City 4, routes are found from every origin building to every destination building, rather than just to and from cities, given the computational time savings made possible by the path finding system. This would allow for a far more refined system of tracking transport use in cities than the original system permits. It would also allow for a far more meaningful figure to be displayed on the "congestion" graph (being the proportion of road tiles in a city that are congested according to the definition above, or something along those lines).

Conceivably, it might even be possible to use this system to allow for combined car/public transport journeys in some cases, although that would probably take a great deal of extra work.

However, before I start to work on any of this, I should appreciate thoughts. Firstly, can anyone see any obvious flaws in this design, or things of which I had not thought? Secondly, although it is easy to re-use end fragments of a route (i.e., those that end at the desired destination), but it is not clear to me whether it would be possible to use middle fragments of a route (those that do not end in the desired destination, but whose middle portion of the route is the same as the middle portion of the best route from the current origin to the current destination). The real issue in that case is: how to know which route's middle portion is indeed the same as the best route, and how to know when to break out from the existing route and find one's own route to the destination? Perhaps one could search in both directions at the same time; but then that risks simply doubling the number of searches in cases where no suitable middle route is found. One could check to see whether that route intersects with any other route that takes one to the correct destination, but such a combined route might well not be the best route. Perhaps one could use the latter strategy, but limit it to the last few tiles (8? 10? 12? 16?) of the route, so that any sub-optimality in routing is largely suppressed. Perhaps one could calculate which road tiles are at the boundaries of cities, and, if one's destination is in a particular city, use any route that intersects with one of those boundary tiles, and then proceed from that tile as normal; but, especially where cities are close together, that risks entering by a sub-optimal route and greatly increasing the journey distance. Perhaps one could iterate through all the city entry road tiles for the destination city (which are likely to be few) and check to see whether there are any intersecting routes to the destination from any of those tiles, and, if there are, which of them is quickest - but if there is just one entry tile with no route to the destination, that would break the accuracy of the search, since entry by that tile might be the best route.

Thirdly, does anyone have any idea as to whether this is actually likely to be faster than the present system? If all building to building pairs were potentially to have routes to and from them, would that really be better than all possible city pairs having such routes, even if there was a more efficient system for calculating the routes?

Fourthly, some thoughts on how to handle the techincal issues would be appreciated. I imagine that a class derived from route_t would be used. How best to append the desired route to the end of an existing route so that they appear to operate as a seamless pair? And how best to handle the refreshing of routes? Sim City 4 simply refreshes all of them in one go once a month. If congestion were to be measured as part of the system of deciding which route were used, refreshing could not be limited simply to when road tiles are built or removed, as now. How best to link the route tiles to the routes: store a set of pointers in every tile, or have a big hashtable in karte_t? And how best to register each use of each route? This is complicated by the fact that the routes would incorporate other routes by reference; how to distinguish between the users of a fragment of a route and the whole route when counting? And how to make all of these operations run at a sensible speed?

Any mental input on any of these issues would be very greatly appreciated!
« Last Edit: April 28, 2010, 10:37:20 PM by jamespetts »

Offline neroden

  • Devotees (Inactive)
  • *
  • Posts: 831
  • Nathanael Nerode
Re: New feature - performance issue
« Reply #18 on: April 29, 2010, 12:47:39 AM »
Your city merger idea is interesting, but sounds as if it would take a lot of work (the idea of converting a town hall into an attraction is cute, however). Is there actually a gameplay problem with one city being contained within another at present?
Very few things depend upon cities (as opposed to the buildings within them) in the current game, actually: basically city growth is it for standard, and the private car algorithm is the only addition for experimental.  City growth could have some odd behavior, however, with a fully enclosed city: it may be unable to grow because it is "insufficiently dense" because all the otherwise empty space in it is taken up by buildings from the OTHER (larger) city.  I'm not sure, perhaps density is measured without regard to which city buildings are officially part of, in which case this wouldn't be a problem.  I haven't generated a city-within-city and run the game long enough to test behavior, nor have I tracked down the code.  With the private car algorithm, it could certainly give weird behavior: upgrade the route between the two townhalls and essentially-local traffic will suddenly decide that private cars are much faster than before.  (So you'd have to avoid doing that.)

Comment on SC4: I do remember it getting sluggish about route updates after major construction.  I believe it had one key advantage: IIRC every residential building had only *two* destinations, ever, and these didn't change very often.  So once a route was found from a residence to a workplace, that destination stayed the same for many game years; the resident never went to any other workplace, and never changed his route there, unless the route got cut, the residence or workplace got destroyed, or unless (IIRC) the route "drove past" a workplace with job openings.  The route could then be stored with the residential building.  I'm not sure how much of an advantage that was, but I suspect it was a substantial savings.  (I think there was also a full-scale search for new destinations and routes periodically, but it was only a fraction of the map each month.  Building a subway might not cause people to switch to the subway until many months had passed, because of this.)

If each passenger building had precisely three destinations (work, recreation, and "visiting people"), and the destinations were "durable" -- rather than the current situation where, if I'm not mistaken, people want to go somewhere different on each trip -- the job of the route-finding code would be reduced substantially.  Each time a car set out from a building it would follow the same route it did last time -- until it got "stuck", that is, due to road reconstruction.  I think that might change gameplay and realism in substantial, undesirable ways, however (though I haven't thought about it very hard... it might actually improve gameplay or realism).

Offline jamespetts

  • Simutrans-Extended project coordinator
  • Administrator
  • *
  • Posts: 20649
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: New feature - performance issue
« Reply #19 on: April 29, 2010, 11:45:59 AM »
Neroden,

thank you for your feedback. It seems that a full system as I had originally described might be a bit much for a computer to handle with reasonable performance. Perhaps, however, the idea of using route fragments can survive in a simplified form: as now, we could just calculate city to city routes, and use the existing non-routefinding method to guess the speed of intra-city routes. The inter-city routes, however, could use route fragments in the same manner as I described in the original outlining of the application of this concept; the inter-city routes would be stored, and, whenever a pathfinding algorithm steps on a tile with a route to the same destination, it would stop looking and just append the remainder of the route to the end of the route found so far. Any thoughts on that method? In particular, is there any good use to which the information as to teh amount of traffic using any particular route could be put...?

As to city mergers; traffic under the new model is an interesting one. It might be possible to detect whether one city is contained within another for the purposes of traffic (check whether a town hall's city road is contained within the confines of another city, although that might require iterating through the whole city list for each town hall road), but the mergers idea is an interesting one all the same. The other thing with cities in Experimental is electrification - it might be tricky and unintuitive to have to have a separate substation in a city entirely contained in another city. Are you volunteering to code it...? ;-)

Offline jamespetts

  • Simutrans-Extended project coordinator
  • Administrator
  • *
  • Posts: 20649
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: New feature - performance issue
« Reply #20 on: May 01, 2010, 10:48:51 PM »
I have split the topic on city mergers into this thread.

As to the private car routing algorithms - does anyone have any thoughts on whether the simplified version of route fragments suggested above would be (1) any faster; and (2) fast enough, and, if so, any thoughts on implementation (particularly on route linking - use a pointer from one route to the next (and then lookup all connected routes when the route finder steps on an intersection), or use a completely new route class involving a doubly linked list (convergence would be easy, but  how to handle divergence?), or copy the route tiles from one route into another (too time consuming? Or is copying a load of pointers actually quite fast compared to lots of pointer lookups during future route finding exercises?))?

Edit: Incidentally, in a test that I have just run, I have confirmed that it is the many calls to "find_route" that is consuming all of the time and making performance poor. With that method call removed but everything else in tact (and dummy iterations to replicate iterating over a route's tiles), the game runs perfectly smoothly on a large map: on the same map with it re-enabled, it is unusably non-responsive. Optimisation will therefore have to focus on reducing the number of calls to "find_route". What are people's thoughts on the question of whether using route fragments will help?
« Last Edit: May 01, 2010, 10:52:17 PM by jamespetts »

Offline jamespetts

  • Simutrans-Extended project coordinator
  • Administrator
  • *
  • Posts: 20649
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: New feature - performance issue
« Reply #21 on: May 02, 2010, 12:27:11 PM »
Some further thoughts on this. Firstly, I have realised that the route fragment idea is fatally flawed because it will not necessarily find the shortest route: the path finder might end up going backwards to a tile that has a route from a far more distant city to the destination city and thus get a tortuous route instead of the fastest one.

The problem is that there are a very large number of calls to the expensive A* find_route method. The efficiency of the current finding algorithm (that is, the one that calls find_route, not find_route itself) is O(n^2), where n is the number of cities. That will generate theoretically 10,000 (actually, 5,000, as a bidirectional connexion is assumed) calls to find_route per refresh cycle in a map with 100 cities, assuming that all cities are, in fact, connected to each other (it runs faster if some cities are not connected to each other, as non-connectivity is propagated). On our test maps with 175 cities, it would generate 15,313 calls per refresh cycle assuming that all cities are connected.

So, perhaps a breadth-first search is the way to go. We can't iterate through a graph of connected cities, though, because the fastest route might not be through another city. To take Dwachs' example of using British cities: that method would entirely ignore the best way of getting between London and Glasgow, using the M6, which does not go through any city. So, a breadth-first search would have to search from every city to every other city, and stop only when it has found connexions to everywhere. That would give an efficiency of O(n) calls to the route finder, albeit a more expensive breadth-first algorithm would be used instead of the existing relatively efficient A* algorithm.

Another thought that I had, which might be somewhat of a pipe dream (no pun intended) was to look into using MMX or SSE optimisations, since a route finder works by performing large numbers of identical operations on arrays. I have not looked into it in detail nor know whether it is suited to this type of algorithm, although any thoughts would be appreciated. What I do know is that it would require different implementations for different platforms using pre-compiler directives, and would take a lot more work to code than a normal algorithm. My first thought is to see whether a standard algorithm is fast enough before looking into this.

I had also looked into the Floyd-Warshall Algorithm that Knightly had earlier used for the goods routing system with great success, which is even more efficient than breadth-first search, but I cannot immediately see how to implement that in a case where, as here, we do not want to find the shortest path between all road tiles and all other road tiles, just between all townhall_road tiles and all other townhall_road tiles.

Incidentally, we cannot, as Dwachs has suggested, simply check to see whether a tile is in a city, as that will tend to underestimate the journey distance, which is likely to involve some passage through a city to get to the actual destination in it.

Any thoughts on the above matters would be much appreciated!

----

Edit: Now I feel rather silly - I looked through the code with a view to making my own breadth-first search method based on the existing A* method in "find_route", only to discover that "find_route" did not use A* at all, but Dijkstra's algorithm: it is calc_route that uses A*. When I substituted a call to "find_route" with a call to "calc_route", the speed was perfectly acceptable (although, on large maps, still slightly sluggish for the first few seconds - fine after that, though). So all this thread was rather in vain! Apologies for burdening people's brains unnecessarily...
« Last Edit: May 02, 2010, 01:17:19 PM by jamespetts »