Passenger generation - coding discussion

jamespetts · July 16, 2013, 01:44:48 AM

I have started work on an overhaul of the passenger generation system on my passenger-generation Github branch. A summary of the work done and to do is in my latest commit message, as follows:

Code Select


ADD: First pass at full implementation of new passenger generation code. Still needs work and calibration. Features:
        - Passengers/mail packets are generated globally, not city by city
        - Passengers generated only in residential houses
        - Passengers head for commercial/industrial buildings (that is, both "industries" and industrial city buildings) (TODO: Allow some passengers to head for other residential areas)
        - Mail is generated in all building types
        - Replace local/midrange/longdistance passengers with commuters and visitors, with no distance restrictions, but different journey time tolerances
        - The current "target factories" system is replaced by and subsumed into this system.
        - Resulting code simplification in many areas (TODO: Remove deprecated code)
    TODO: Add various simuconf.tab customisations.
    TODO: Allow onward as well as return journeys
    TODO: Add feature to limit number of times commuting trips can be made to buildings in any given month based on the number of jobs that they provide (conceptually tricky - needs further thought, but important).
    TODO: Store the private car ownership statistics in the world, not every city
    TODO: Allow more than 15 alternative destinations, and allow number of alternative destinations to vary dynamically with map size

Firstly and generally, if anyone would like to look over the code so far, though it be a work in progress, I should be grateful. (I should add that I have realised that I have omitted one further item from the to-do list, which is adding depots other than those built at sea and station extension buildings to the "commuter targets" list such that transport infrastructure itself, realistically, is a source of jobs.)

Secondly, and more specifically, I should be grateful for any assistance about how to implement a seemingly straightforward but actually very difficult feature, albeit one of economic significance. What I should like to be able to do is have each building which is a destination for commuters, whether that be a city building, attraction or industry, have a limited number of jobs, such that commuters cannot successfully travel to that building if more than a certain amount, determined by that building's level, of commuters have travelled to that building within the last month (or similar). The statistics of passengers arriving as commuters both to industries and city buildings are already recorded and reset monthly.

The simplest implementation, and one which at first sight would seem easy, is simply to reject as a destination any building which has exceeded its monthly allocation of commuters when trying to select a destination. However, two problems arise with this. Firstly, this would mean that passengers would be able to go anywhere at the beginning of a month, but be very restricted towards the end of the month, producing uneven flows. One might say that uneven flows for commuters are realistic (think of rush hours), but our current routing system will not cope well with this, as it does not recognise different timetables at different times of day, and simply averages travel and waiting times generally.

The second problem, which is more subtle but more difficult to solve, is connected to the first. The system that I am in the process of implementing is one which replaces the notion that passengers have certain distance ranges and will have a larger journey time tolerance at longer distance ranges to one which ignores physical distance and works on the basis of a random but potentially large number of alternative destinations and journey time tolerance. Passengers will have between, say, 1 and 25 ranked destinations (the number between 1 and 25 being random, the range being set in simuconf.tab), each destination being a weighted random choice of a possible target building anywhere on the map, and will check whether there is a route of any sort (walking, public transport, private car) to any sort within the randomly assigned journey time tolerance to each of the destinations: if there is, the passenger will travel to that destination, and will not check any further destinations. If we have a rule that commuters cannot travel to destinations where all the jobs for that month are filled, then at the end of each month, there will be a limited number of destinations to which commuters can actually travel. There are three potential ways of handling this. One is to reject at the stage of actually assigning random destinations to the set of (say) 25 alternatives any building whose jobs are full. That would, however, mean a retry loop in the method for selecting buildings at random, which, by the end of the month would get extremely slow, as every single commuter trip would have to check it very, very, very many times to get a building that has jobs available. For that reason, I do not think this method feasible.

Another possibility is simply to adjust the weights of buildings in the weighted vector of commuter targets as they use up jobs, finally removing them from the list altogether once the number of jobs have reached zero. This is a seemingly attractive option, but I strongly suspect that constantly adjusting the weights in a weighted vector containing all the commuter targets on the whole map would again be excessively slow to a somewhat extreme degree on a larger map. One alternative version of this, I suppose, is simply removing buildings from the commuter target list (having a shadow list from which the buildings are not removed and overwriting the main list with the shadow list at the end of every month). This method would result in commuting passengers from all over the map gravitating to the few remaining buildings with jobs towards the end of the month, creating some possibly erratic commuting patterns, however.

The final possibility is to reject "full" buildings at a later stage, after the commuting passengers have assembled their list of potential destinations, and when each destination is checked for a viable route. If available jobs are checked at this stage, it will mean that, instead of gravitating to buildings with remaining jobs at the end of the month, commuting passengers will simply travel less and less towards the end of the month, as the chances of them succeeding in finding a route to a building with an available job falls. This might be said to be consistent with a sort of rush hour effect at the beginning of each month, but see above on the routing system not being designed to cope with uneven flows very well.

One of the purposes of recording the number of passengers who have successfully travelled is for a future feature in which growth will be determined by these statistics. The third method of implementation would affect these statistics, as commuting passengers would be increasingly unable to travel towards the end of the month. This could be calibrated for, I suppose, if one averages it all out, but I do wonder whether this could be calibrated properly.

Any thoughts on any of these issues would be most welcome.

Carl · July 16, 2013, 07:24:30 AM

Some very interesting ideas here! I don't have any solutions to the problems off the top of my head (although I will give them some thought), but I do have one thought. If a good portion of the passenger generation is going to be related to commercial buildings, we will have to make sure that enough commercial buildings are being generated upon map creation and evolution. I confess to being somewhat ignorant as to how this goes, since it's been a while since I played a may where the cities weren't hand-made. But one would have to make sure that maps typically generate enough commercial buildings to provide destinations for a decent proportion of the passengers.

In fact -- it is only because of the 'monthly allocation' idea that the above question comes up at all. In the absence of a per-building allocation, one would not have to worry about whether there were enough commercial buildings to receive commuter traffic. Perhaps a fool-proof implementation of this would make the per-building allocation optional, in case it turns out not to play well with certain paksets/setups?

(Also, you say that commercial buildings will no longer generate traffic. I assume they will still generate return journeys for those who travel there?)

Michael Hauber · July 16, 2013, 09:27:26 AM

I'm not sure what role the 25 alternate destinations plays, and how often this is updated. But what about giving each passenger one job only, and this job is updated only rarely. This passenger will then travel to this job every time it makes a journey.

Alternately instead of resetting the job statistics monthly, reset them daily. If there is a computationaly reasonable way, store the last 30 days of job arrivals for each destination, and then dump the first day and add a new day at the end of each day. Or do a performance fudge by calculating as if the last 30 days of job arrivals have been the same each day. At the end of each day multiply the arrival stat for the last month by 29/30 to get the stat at the start of the new day.

To get around problems with most jobs full and having to do many searches to find the vacant ones, consider allowing limited over employment. One option may be to allow an extra 5% overemployment for each destination that has failed due to full jobs. So the first attempt to find a job needs <100% of jobs used up to succeed. But if this is full then the next attempt needs <105% etc. Also allow a few residents to not find jobs, and some extra jobs that will not be filled, which is what happens in real life.

jamespetts · July 16, 2013, 12:37:50 PM

Thank you for your responses. As to ensuring sufficient generation of commercial buildings, two points to note. Firstly, it is not just commercial, but also industrial buildings (including factories, industrial city buildings, and, when I have implemented it, certain items of transport infrastructure such as depots and extension buildings) that can be commuter destinations. Secondly, the long-term plan is also to overhaul the city growth algorithms to match this feature. As to making the allocation of jobs optional, that is an interesting idea, but I think that I shall need to test first whether this is a problem in the first place. Finally, yes, return journeys will indeed be generated back from commercial and industrial buildings. Indeed, I plan also to have onward journeys to enable journeys of more than two legs (e.g. residential > commercial > commercial > residential).

As to having specific jobs for specific residential buildings (there is no persistent "passenger" in the game to which a job could be assigned), I had considered this model, which is what is used in Sim City 4. However, such a system would add complexity without adding any significant function: there would be an awful lot of data to have to store and recalculate periodically (and how often would one recalculate it in any event?).

As to "daily" statistics, Simutrans-Experimental does not really have a concept of a day: it has hours and months. It is already configured to reset most statistics at the beginning of each month (which, in the world of Simutrans-Experimental, lasts, depending on the bits per month and meters per tile settings, usually somewhere between 3 and 12 hours).

As for allowing a certain percentage of over-employment - I am not quite sure that that would work well (and it would also introduce extra calculations involving division very many times over into performance critical code, which is not ideal), not least because it seems somewhat arbitrary: how would it be calibrated? The final suggestion, of calibrating things such that a number of jobs are not fully filled and a number of residents do not always make successful commuting trips towards the end of the month is probably a more satisfactory answer.

jayem · July 16, 2013, 09:42:11 PM

[conciser summary]
Does the 'worker location' details given for factories offer a way to get some of the benefits of "specific jobs for specific residential buildings" on a larger scale while using the existing and suggested methods to add the finer detail. Or will it get the worst of both worlds

[original post]

Quote
Secondly, and more specifically, I should be grateful for any assistance about how to implement a seemingly straightforward but actually very difficult feature, albeit one of economic significance. What I should like to be able to do is have each building which is a destination for commuters, whether that be a city building, attraction or industry, have a limited number of jobs, such that commuters cannot successfully travel to that building if more than a certain amount, determined by that building's level, of commuters have travelled to that building within the last month (or similar). The statistics of passengers arriving as commuters both to industries and city buildings are already recorded and reset monthly.

I was thinking about this yesterday*, and was wondering about the section labelled "workers from exville-1454" and how scalable factories behave. And how the routing occurs.
If that acts in a consort with the town and has a bearing on travel then some aspects of the calculation might be done for you.
In particular you know there must be a valid house (barring rearrangements) somewhere in a much smaller area. So at that point the search at the end of the month is much less.
Of course if you go for perfect matching every month then you'll still have to find that last house (or fail) but at least you'll be looking where you might find it, but if you have unemployment then even at the end of the month there should be a fair chance of making finding a valid house as there will be spare people who don't need to travel**. [and it will be spread out about evenly]

With the smaller factories you could group them into a factory aspect to the city, (here of course you'd have to find a valid industrial building and residential) would then have a higher chance of picking a valid pair from the city of work and the city of residence than purely at random) but would lead to e.g. London consisting of Chelsea/ etc... which would have other implications. (again if you have 'vacancies' and set the capacity of a factory higher than the it's capacity to the employment then again the distribution should avoid abuses-especially at the visible scale while having a fairly high success rate)

*for what it's worth the hypothetical model was
food arriving to a city->population (growth) (with various fudges to allow reasonable single player and not chaotic behaviour)
extra population&unemployment rate high:
single journey to other city with better unemployment rate
grow that city (district) even more
extra population&unemployment rate low (and other stuff):
find empty spot in town to build/promote as house
try finding a working place (with satisfactory use, again need fudging) within distance add to labour from town
if over capacity breed factory, split workers
if no factory add to dole queue
if route has a failing journey (in X attempts) remove from work add to dole
[it's not properly thought out yet, it seems it would give suburbs and industrial revolution cascade effects, but I wasn't quite sure about it's complexity (and all the other things)
I don't know if any thoughts mesh, and not fully sure how it works at the minute]

** crude example
If you have one passing square out of 100 you have a 99/100^n of failing to pick it in n attempts (80% chance of giving up after 20 attempts)
If you have 11 passing squares out of 110 you have 99/110^n of failing to pick it in two attempts (12% chance of giving up after 20 attempts)

MCollett · July 16, 2013, 11:01:08 PM

Quote from: jamespetts on July 16, 2013, 01:44:48 AM
Secondly, and more specifically, I should be grateful for any assistance about how to implement a seemingly straightforward but actually very difficult feature, albeit one of economic significance. What I should like to be able to do is have each building which is a destination for commuters, whether that be a city building, attraction or industry, have a limited number of jobs, such that commuters cannot successfully travel to that building if more than a certain amount, determined by that building's level, of commuters have travelled to that building within the last month (or similar). The statistics of passengers arriving as commuters both to industries and city buildings are already recorded and reset monthly.

I suggest not trying to do this as a hard limit each month. Instead, if a given factory has 'too many' workers arriving one month, make each generated trip the following month have a chance of failing, so that the expected number is right.

For example, suppose a new factory starts up wanting 120 workers. In its first month, it is overwhelmed with job applicants - 175 people turn up for interview; but of course, only 120 get jobs. So the following month, each prospective commuting trip to the factory has (on top of all other filters or tests) only a 120/175 chance of actually happening. In that second month, the actual number of workers turns out to be 118, so for the third month the acceptance chance is increased from its previous value by a factor 120/118. And so on. (Obviously, the acceptance factor would never go above 1, how ever low the number of workers.)

Best wishes,
Matthew

Michael Hauber · July 16, 2013, 11:09:13 PM

The main function of allocating fixed jobs would be to enable more sophistication in the job finding calculations, but doing the job finding calculations less often and re-using the last result.

Here is an alternate version of 'over employment', splitting my idea in two parts. First allow a fixed over employment rate, eg up to 105%. This is really just another way of allowing employment at a destination to be below the maximum, but setting the maximum at '105' instead of '100', but having employment rates at different buildings ranging from say 95-105% might feel nicer for the player than having employment rates ranging from 90-100%.

The other part of the idea is to perform some calculation so that lower employment destinations are preferred. A different option would be trying to find two potential targets and chosing whichever target has the lowest employment rate. Of course finding two targets sounds like it might make a critical calculation twice as slow.....

isidoro · July 16, 2013, 11:19:20 PM

Two possible solutions to one of James' questions come to my mind:

To get rid of some month effects, distribute the building capacity proportional to the day of the month. E.g. if a building has a capacity of 30 workers, allow up to 1 travel to it the first day of the month, up to 2 travels the second day... If d is the day of the month, 1<d<=30, let capacity for day d be month_capacity*d/30
Why not just simulate the time a worker stays at a factory/shop? Each time a traveler wants to go to a factory, provided there's room for him, the occupancy is incremented. And, at the same time, a factory/shop with people there is chosen at random and a travel from it is generated and its occupancy is decremented...

News:

Passenger generation - coding discussion

jamespetts

Carl

Michael Hauber

jamespetts

jayem

MCollett

Michael Hauber

isidoro