News:

Do you need help?
Simutrans Wiki Manual can help you to play and extend Simutrans. In 9 languages.

City populations

Started by ras52, February 10, 2012, 12:43:22 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

ras52

The merge of Prissi's new city generation code from Standard into Experimental has broken the code to make city sizes obey Zipf's Law.  The number of big cities is now ignored, and almost all cities have a size of around 3 times the requested median.  The breakage is caused because the city generation code no longer creates a city with its intended size, but creates it with no population and later gradually increases the population to simulate organic growth.  The attached patch fixes this so that the organic growth code works correctly with the Zipf Law code, and we respect the requested median and number of big cities.

Edit: I may as well upload two other trivial patches for the population code that I've been using for some time as I think both deserve being included in Experimental.  The reproducible-randoms patch changes the only use of the C library rand() function (which occurs in the Zipf Law code) to use the Mersenne twister implementation in simtools.cc.  This has the advantage of being the same across platforms and means that a quick hack to disable the setsimrand call allows you to repeatedly use the same sequence of 'random' numbers, e.g. while diagnosing bugs.   The second patch allows you to set the number of big cities to zero if you so wish.  There's no technical reason for this restriction, and we allow users to disable other options that will result in a sub-optimal game experience, e.g. disabling rivers.

Edit 2: The current population code doesn't actually seem to be producing city sizes the obey Zipf's law.  Inkelyad's original patch produced numbers that fitted Zipf's law but in slightly aphysical way.  (The algorithm picked sizes that precisely fitted Zipf's law, with the i-th value proportional to 1/i, and then overlayed Gaussian noise.)   The current code linearly decreases the size, with special code to make a few big cities.  Special cases for big cities shouldn't really be needed if we model Zipf's law correctly as they're already predicted by the model. 

I've had a go at reworking the city size code in an attempt to make it more realistic.  Instead of calculating values that fit Zipf's law, we instead pick random values from a Pareto distribution which has that effect but with much more realistic randomness.  The link inkelyad posted ) suggests city sizes do fit a Pareto distribution, and the result is that we get occasional very big cities just as in real life without the need for the big cities option.  The Peano distribution has a minimum cut-off because otherwise you'd get more and more smaller and smaller cities.  The median is twice the cut-off which makes it easy to fit.  (Slightly unusually, the distribution doesn't have a finite mean.)  The attached patch (population-pareto-distribution.patch) implements this, although it is not yet ready for release.

I think the results are quite elegant.  The three attached screen shots have 100 cities on 512x512 map with a median population of 500 which results in a healthy number of small cities.  The current code with pak128.Britain can't make cities with a population under 271, so a median gives a cut-off of 250 which is about right.  The big city in one of the images has a population of about 450,000 (though there are several nominally separate villages embedded in it, much as in real life).  What do others think?  Better or worse than the current code?
Richard Smith

jamespetts

Richard,

this is very interesting - thank you very much for your work on this! As noted in other threads, I am staying with my parents for the week-end, so unable to do anything about any of these until next week, but your work is very much appreciated.

Just one question - you mention that the pareto patch produces large cities other than those specified in "number of big cities" - is this not possibly problematic behaviour, as it defies user expectations...? Or have I misunderstood something?
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

ras52

Quote from: jamespetts on February 10, 2012, 11:55:41 PM
Just one question - you mention that the pareto patch produces large cities other than those specified in "number of big cities" - is this not possibly problematic behaviour, as it defies user expectations...? Or have I misunderstood something?

No, you've not misunderstood, and it's one reason why I said that patch isn't yet ready for release.  At present, my Pareto patch completely ignores the "number of big cities" setting.  A related issue is that very occasionally, it will try to generate a seriously big city.  I've successfully had a few cities with inhabitants of around half-million (they're take up about 300x300), but the code will extremely rarely try to generate much larger cities, cities with a billion inhabitants.  Obviously these will not work.  On smaller maps such as the demo, even a couple of tens of thousand is a problem.  I think the solution is to apply a maximum population cut-off, and maybe use the "number of big cities" setting to allow that number of cities exceeding the maximum cut-off by up to a factor of (say) ten.  Alternatively, we could just drop the "number of big cities" setting.  Probably something to play about with.
Richard Smith

jamespetts

Yes, I see - a not entirely easy thing to consider. Do we think that "number of big cities" is a useful thing to be able to control? It does go rather nicely with city clusters (having the number of big cities similar in number to the number of clusters can work nicely), and it allows some fine tuning of the game experience; but is it worthwhile? I should be interested in views on this.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

sdog

the number of big cities could be used to re-calculate sets when the number is not met. They are generated by the same density function, but we break the distribution by cherry picking special cases. the city sizes still roughly follow the power law.
The cut-off can be given as a parameter for players to alter, you could name it large city threshold.

inkelyad

#5
My old code was generating big cities first. And then first cities were used as clusters centers. Your Pareto distribution patch break 'clusters should be around big cities' model.

ras52

Thanks for your thoughts.  It should be easy to sort the population list so that clusters form around big cities again.  However I suspect there will be a danger of the surrounding villages ending up entirely within the central big city.  Let's see.  I should be able to revive the "number of big cities" parameter in the way sdog suggests.
Richard Smith

inkelyad

Quote from: ras52 on February 11, 2012, 12:52:49 PM
However I suspect there will be a danger of the surrounding villages ending up entirely within the central big city.
Well, yes. city placement function stadt_t::random_place is not tuned for reeealy big cities. But cities should push away each other. See city_isolation_factor.

sdog

Quote from: ras52 on February 11, 2012, 12:52:49 PM
However I suspect there will be a danger of the surrounding villages ending up entirely within the central big city.  Let's see.  I should be able to revive the "number of big cities" parameter in the way sdog suggests.

don't bother, happens all the time in real life.

jamespetts

Somebody (Neroden, I think) was at one time working on a "city merger" patch, which would be useful in these situations. I have noticed in the server game a rather difficult bug caused by one city being contained within another: all the passengers generated by the smaller city are recorded as being transported by the larger city, but generated by the smaller city, which makes the larger city grow rather too rapidly.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

ras52

I've now got cities sorting so that clusters form around the biggest cities (though I'm not convinced it's actually working).  I've added two settings, max_small_city_size and max_city_size, for the upper and lower limit of what constitutes a big city, and the "number of big cities" option is respected.  I wonder whether max_small_city_size and max_city_size should be somehow combined with one or both of city_threshold_size and capital_threshold_size.  Certainly they're similar.

I'm not sure I'm a fan of the "number of big cities" option now that we're using a Pareto distribution, and would be happy to remove it.   But if I'm in a minority there, I can simply set max_small_city_size to very high (e.g. 500,000) and only generate "small" cities to get what I prefer.

The results are all in this github branch.
Richard Smith

sdog

the numbers of settings for the player should be reduced, if you would like to suggest it for standard.* however, it's not a bad idea to gather experience in simutrans with a larger number of options available to the player.

*since prissi was convinced and even coded the new wonderful city generation, i have hope for this to happen, when all his concerns are met. Especially with inkelyad's city clustering and city like water it is producing much more playable maps.

Allowing the player tp set the number of large cities artificially is rather important for gameplay on medium to small maps. A single city does not provide enough inter-city options. One would only play a local train game in such a case. A number of players will not so much mind a very realistic distribution in favour for a playable map. It should be easily possible of course to get to a good distribution still.

inkelyad

Quote from: ras52 on February 12, 2012, 01:08:18 AM
I've now got cities sorting so that clusters form around the biggest cities (though I'm not convinced it's actually working). 
You can add DEBUG_WEIGHTMAPS to compiler options. stadt_t::random_place will write *.pgm files in simutrans working directory after that. Look at them with descend image viewer.

Carl

Quote from: sdog on February 12, 2012, 01:19:58 AM
Allowing the player tp set the number of large cities artificially is rather important for gameplay on medium to small maps. A single city does not provide enough inter-city options. One would only play a local train game in such a case. A number of players will not so much mind a very realistic distribution in favour for a playable map.

I second this.

jamespetts

Richard - I have just noticed this commit in Standard. May I ask - how will the new system cope with this change?
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

jamespetts

I have tried this (now integrated with the -devel branch) with a 256x256 map, and the largest city had a population of over 180,000, filling half the map. This looks as though it might need some work...
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Milko

Hello

It 's an old debate, I tried to create some maps and I do not think that I ran into this problem. This problem has been solved? (ras52 unfortunately no longer connected ...)

Giuseppe

Carl

I'm not sure -- I sometimes still get very, very large cities when testing the devel branch.

Milko


It 's unfortunate that ras52 no longer connected, he was able to help development.

Giuseppe

ras52

#19
Not sure whether resurrecting such an old topic is the done thing, but I thought I would note that I've returned to this work and have made some changes on my city-population branch. 

A number of problems have been reported with the current code: first that the requested median seems not to be respected, and second that occasionally cities are being built that are well over max_city_size.  The problem was not in the map-generation code in simworld.cc — I'm pretty sure that the code really does respect the median (on average), except insofar as to ensure the required number of big cities are created; and likewise, I'm pretty sure it never requests cities above max_city_size.  The problem is in the city growth code and, although the major part of the fix is a single line change in simcity.h, it has the potential to be quite disruptive, so I feel I should try to explain exactly what has changed.

The problem is that the stadt_t class has two separate notions of the city population: there's the value stored in the 'bev' member, and the value returned by get_einwohner() ("get population").  I don't know how this arose, but it's the case in both std and exp, and has been like this since at least Jan 2009.  When the city generation code calls change_size(), it causes the city to grow until 'bev' is the requested size; but the value shown in the city info page, etc., is what get_einwohner() says.  The relation between the two is not trivial, but it seems to me that 'bev' has most of the right properties for the city population.  The value of get_einwohner() is increased by new industrial and commerical buildings and by the city halls, which the bev value is not.  Unemployed people (or perhaps more accurately: people not employed or supported by an employee) add to the get_einwohner() count, which seems wrong as they'll already be counted, either as inhabitants of residential property, or in the homeless figure.  The main problem with 'bev' is that a residential building that's only marked level=1 in the pak contributes nothing.

My patch changes the implementation of get_einwohner() to simply return 'bev'.  This means that this discussion documenting the 6N contribution per-tile of a level=N building will no longer apply.  The overall result of this is to the reduce city populations so that the requested median and max_city_size settings will be respected.  Previously a residential building contributed 10(N-1) to 'bev'.  I have changed this to 10N so that level=1 residential buildings still provide accommodation.  Commercial and industrial buildings contribute 20N (previously: 20(N-1)) to the number of people supported by employment.

I noted here that city populations are too low for their areas, and this seemed consistent with others' experiences.  The move from meters_per_tile=250 to 125 has improved this, but not completely fixed it.  I've therefore implemented two new simuconf.tab settings — population_factor and employment_factor — that can override the multiplier in the 10N and 20N contributions to accommodation and employment.  I reckon the former should be around 25; I've no strong opinion on the latter, but I've been using 35 for the last couple of days and that seems reasonable.  By reducing employment_factor to less than double population_factor, it results in more industrial and commercial buildings which I like, though I'm not basing that preference on any historical data.

I don't think very much cares about the actual value of the city populations, only their relative sizes, so I hope this won't break anything.  The number of potential passengers that a locality produces is governed by the building level (or passengers) setting, and the passenger_factor, so population_factor is irrelevant there.  And so far as I can see, both from experimenting and by reading the code, the congestion and city-cars code doesn't care either.  Edit: Both congestion and power demand depend on the population: I will fix these to remove the dependence on population_factor, but they will be effected by the change in implementation of get_einwohner().  This raises a serious issue: how do we cope with old save games?  Because of the changes to congestion and power demand, it seems like we're going to have to introduce a new setting, uses_new_population, so allow get_einwohner() to return its old value with old games.  This clearly needs fixing before my patch can be used.  The main missing feature is that manually building and destroying buildings doesn't affect the population in the expected way, but then, the old code didn't get that quite right either.  I shall try to fix that.
Richard Smith

jamespetts

Richard,

welcome back! Thank you for looking into this in detail. Firstly, have you merged in the latest Experimental code before making your modifications? I have not had a chance to look at your code yet, but there have been a number of changes to Experimental since your city populations branch was implemented.

Secondly, did you note that there was discussion elsewhere (I cannot immediately remember in which thread, I am afraid) concerning the populations issue and a workaround for it involving modifying the max_city_size in simuconf.tab? That seemed to produce reasonably sized cities, at least, although that doesn't mean that more refinement isn't useful.

Thirdly, and most significantly, if we are making any changes to the way in which city population is computed, we need to consider a more accurately simulated relationship between city population figures and the numbers of passengers generated over a given interval of game time by such cities. I am keen one day to have such a realistic relationship, as that will be important to simulation realism, and think that it is best until such a realistic relationship can be achieved to stick with the relationship used in Standard, as it is at least something with which Simutrans players are familiar.

Although, as I have indicated, I have not had a chance to look over your code, what you describe seems to go some way towards this goal, but further thought will need to be given to this process. Firstly, we need to be able to work out a system of producing a passengers generated per game hour per unit of population figure in the game; secondly, we need to implement that in simuconf.tab as a variable parameter; and thirdly we need to do some research to work out what realistic values might be. This is complicated by the following things: (1) this figure is not constant in reality, as there are peak times and night times: for reasons which I shan't traverse now, it is not practical to implement non-constant demand in Simutrans, so we have to work out a workable average figure; (2) this figure may also have substantial historical fluctuations, and we need to work out from which era(s) to take the data; (3) connected to (2), we need to consider whether to make this figure change with the passing years; and (4), connected to (3) and (2), the number of passengers actually generated in Simutrans-Experimental is a maximum theoretical figure, as many passengers do not travel, as the journeys to their intended destinations are impossible or exceed their journey time tolerance: this is intended to be realistic, but it means that the number of passengers per unit of population per hour in real towns and cities will not be the same figure as the number of potential passengers generated by Simutrans, because not all of those potential passengers will be able to travel, so we have to work out what proportion of potential passengers travel in a well connected city in Simutrans, and use a modification factor to correlate this with data from an equally well connected city in reality - we also need to consider whether this factor is sufficient that we do not need a separate system for varying the proportion of people who travel over time (because transport links will become better as time progresses, and more potential passengers will actually travel).
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

ras52

Thanks for your detailed comments, James.  To answer your first two questions:

Yes, I've merged everything from your devel into my city-population branch so it's up to date.  (Is devel the right branch to be basing it on?  I don't necessarily expect this work to go in before next release, which I'm guessing won't be all that long away now.) 

And, yes, I think I saw some discussion about reducing max_city_size to resolve the problem of very big cities.  That workaround is probably okay for the forthcoming release, but it's not really a satisfactory long term solution: max_city_size should be maximum permitted city size, not some arbitrary parameter that's about a third of maximum permitted city size.  Similarly, Carl Baker reported that the median is not being respected.  That's the same bug.  Even if though it doesn't affect game play, it's confusing to users which isn't good.  But I don't see how this can be fixed while preserving the implementation of get_einwohner() from Standard, as it's precisely that which is causing these problems.  Fortunately, in Standard the error is cosmetic as the only thing that uses get_einwohner() are the city info pages.  It's only in experimental that we have features like city electricity use and congestion that depend on it.  So if Standard were minded to fix it, they could do so without danger; and if we fix it directly on Experimental, the difference with Standard really shouldn't confuse users familiar with Standard because Standard's uses of it are purely cosmetic and the affected features in Experimental are those that don't exist in Standard.

In my previous post I talked about incompatibilities with old saved games.  But do we actually care about these?  Old saved games will still work: they just won't be quite so well optimized.  An increase in congestion might be enough to turn a previously profitable line into an unprofitable one.  Is it worth going out of our way to prevent this?

I agree that the relationship between population and passengers needs more work in order to be true to life, and I had been intending to post something on that issue myself.  I'd come to that conclusion because at present in the early years, passenger production is insanely profitable.  For example, in 1750, running a stage coach from end to end of a small village occupying less than a square kilometre results generates an operating profit so high that it pays back the capital investment in vehicles and stops in only 14 months.  In real life, there's no way that such a route should be profitable (at least not in isolation, rather than part of a wider network).  In a bigger city, you can flood the map with stage coaches and make vast, vast profits.  Part of that is a balancing issue — I'm pretty sure that stage coaches, specifically, need a higher operating cost, though that's a topic for elsewhere — but the problem applies to any of the other passenger vehicles too, and I think the underlying cause is that the number of passengers wanting transporting is unrealistically high.  I had concluded that passenger_factor should probably be time-dependent, much as car_ownership and electricity_consumption currently are.

If we're going to address, whether sooner or later, the relationship between passenger generation and population, we need to be clear as to what exactly the level parameter means for a residential building.  I can't find any discussion of this in the forums, but based on how it's typically used in pak128.Britain, a residential building recorded in the pak as level=N seems to house N average-sized households (or families if you prefer).  I suspect its best to think of occupancy in terms of households as that's more likely to be constant over the life of a building.  For example, a three-bedroom terrace house might have housed a family of eight in Edwardian times, but only three in modern times.  But that's still one household (or family), and as the level is constant, it probably makes sense to map the level to the number of average-sized households accommodated.  If we want to be really accurate, we should model the change of household size with time, but that's perhaps a step too far.  However, because of scale disparity, each residential tile represents several buildings.  This post suggests that in pak128.Britain, buildings are drawn so that the length of a tile is about 30m, but meters_per_tile=125, which suggests we should consider each tile to represent a cluster of (125/30)^2 ≈ 17 similar buildings.  So perhaps instead of the population_factor I've implemented on this branch, we should be calculating it from more fundamental quantities that can be established more readily.  E.g.
population = level * (meters_per_tile / meters_per_tile_image)^2 * mean_household_size

On a similar basis, and I think this is what you're suggesting, we might better model the number of passengers by considering in more detail why the journeys occur.  I suspect that people in 1750 made similar numbers journeys as nowadays, and for similar reasons — to work, to the shops, for leisure activities.  The difference was that in 1750, most of these journeys were on foot.  Maybe instead of trying to model the increase in passengers over time (which will almost certainly involve making up a lot of numbers), we should try to model these journeys on foot, in much the way that we model competition from private cars?  Let me think a bit more about that...
Richard Smith