News:

Simutrans Forum Archive
A complete record of the old Simutrans Forum.

Calibrating the passenger factor

Started by jamespetts, November 28, 2012, 03:15:05 AM

Previous topic - Next topic

0 Members and 2 Guests are viewing this topic.

sdog

#35
Your sollution might be introducing a concurrent process reducing passenger numbers based on time, similar to your car ownership. (in fact it could be done with exactly that system). At those times cost was a prohibitive factor in stagecoach use.) Using a "exclusive pedestrian" percentage who are not using public transport for their travels you can reduce the number of users considerably.

A first guess for this would be assuming only the middle class would use stagecoaches. Education figures will effectively give this away. So anyone leaving school before being 14 could be considered too poor to pay for transport. This would result in 2% transport users in 1870, based on the Robbins Report (mentioned in the somewhat linked mail discussion) Have a try to change your car-user table to 98% before 1900 and see what happens :-)



http://myweb.tiscali.co.uk/webbsredditch/Chapter%201/Travel%20in%2018thC.html
puts the cost of a 50 km trip in 1794 at 5 shilling outside, 9 sh inside.

Ten to 20 Lb labourer anual income, would be 16 to 32 sh a month. Here one has to consider also that in europe farm hands typically got the majority if not all of their compensation as alimentation and lodging. (Or was this different in England from the rest of Europe?)

kierongreen


int biased_random(int max) {
  int random1 = simrand(max);
  int random2 = simrand(max);
  return (random1*random2)/max;
}


Or just bite the bullet and implement a normal distribution.

jamespetts

Sdog,

I have considered in the past using costing as a factor for transport, but rejected it on the ground that it would introduce undesirable complexities or, if the complexities were to be glossed over or simplified away, perverse results/incentives.

The above figures show that the journey time tolerance feature really does need to be modified in order to produce a better set of numbers, I think.

(Incidentally, the private car system could not be adapted as you imagine, as it involves checking whether the passenger has a private car, then checking whether the journey can be completed with a private car, then checking whether it can be completed with public transport, then comparing the merits of the two modes).

Kieron,

Hmm, I did implement more or less that mechanism:


/* Generates a random number on [0,max-1] interval with a normal distribution*/
#ifdef DEBUG_SIMRAND_CALLS
uint32 simrand_normal(const uint32 max, const char* caller)
#else
uint32 simrand_normal(const uint32 max, const char*)
#endif
{
   const uint32 half_max = max / 2;
#ifdef DEBUG_SIMRAND_CALLS
   return (simrand(half_max, caller) + simrand(half_max, "simrand_normal"));
#else
   return (simrand(half_max, "simrand_normal") + simrand(half_max, "simrand_normal"));
#endif
}


but I think what is needed is not a normal distribution at all, but a declining probability the higher that the numbers get: after all, the median of 30 and 4320 minutes, the long distance range, is 2,190 minutes, or 36.5 hours! Having this as the most common journey time tolerance for long distance journeys would not work. What I really need is having, say, three or two and a half hours as the most common journey time tolernace for long distance passengers, with a significant minority getting tolerances between 30 minutes and 3/2.5 hours, and a dwindling minority getting journey time tolerances above that in the stratospheric ranges.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

kierongreen

Note the multiplication rather than addition in the code I gave this morning, this should skew the distribution :)

jamespetts

Interesting! To what extent will it be skewed?
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

kierongreen

#40
The peak value will be 1/4 the maximum value. Though actually you are better off with (((rand(max)+1)*(rand(max)+1))/max)-1 to avoid a large number of 0s being generated. If you multiply 3 random numbers from 1 to max then divide by max*max the peak will be at 1/9 and so on.

MCollett

Quote from: jamespetts on December 10, 2012, 10:22:11 AM
I think what is needed is not a normal distribution at all, but a declining probability the higher that the numbers get
An exponential distribution is a natural one for waiting times.  A fraction (1/e)^n would wait n times the average waiting time (or equivalently, a fraction (1/2)^n would wait n times the 'half-life').

Best wishes,
Matthew

jamespetts

Quote from: MCollett on December 10, 2012, 07:53:01 PM
An exponential distribution is a natural one for waiting times.  A fraction (1/e)^n would wait n times the average waiting time (or equivalently, a fraction (1/2)^n would wait n times the 'half-life').

Best wishes,
Matthew

We are dealing with journey time tolerances here, not waiting times - unless that is what you meant?
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

sdog

Is there a fundamental difference between waiting times and journey time tolerances? (Not in their effect or implementation, but in the distribution of their length)

jamespetts

Yes - waiting time tolerances are not randomised and set at passenger generation time, but fixed and checked every 256 steps for all waiting passengers/mail/goods. The formula for calculating waiting times is as follows:


// Checks to see whether the freight has been waiting too long.
// If so, discard it.
if(tmp.get_besch()->get_speed_bonus() > 0)
{
// Only consider for discarding if the goods care about their timings.
// Goods/passengers' maximum waiting times are proportionate to the length of the journey.
const uint16 base_max_minutes = (welt->get_settings().get_passenger_max_wait() / tmp.get_besch()->get_speed_bonus()) * 10;  // Minutes are recorded in tenths
halthandle_t h = haltestelle_t::get_halt(welt, tmp.get_zielpos(), besitzer_p);
uint16 journey_time = 65535;
path_explorer_t::get_catg_path_between(tmp.get_besch()->get_catg_index(), tmp.get_origin(), tmp.get_ziel(), journey_time, h);
const uint16 thrice_journey = journey_time * 3;
const uint16 min_minutes = base_max_minutes / 12;
const uint16 max_minutes = base_max_minutes < thrice_journey ? base_max_minutes : max(thrice_journey, min_minutes);
uint16 waiting_minutes = convoi_t::get_waiting_minutes(welt->get_zeit_ms() - tmp.arrival_time);
#ifdef DEBUG_SIMRAND_CALLS
if (talk && i == 2198)
dbg->message("haltestelle_t::step", "%u) check %u of %u minutes: %u %s to \"%s\"",
i, waiting_minutes, max_minutes, tmp.menge, tmp.get_besch()->get_name(), tmp.get_ziel()->get_name());
#endif
if(waiting_minutes > max_minutes)
{
#ifdef DEBUG_SIMRAND_CALLS
if (talk)
dbg->message("haltestelle_t::step", "%u) discard after %u of %u minutes: %u %s to \"%s\"",
i, waiting_minutes, max_minutes, tmp.menge, tmp.get_besch()->get_name(), tmp.get_ziel()->get_name());
#endif

// Waiting too long: discard
if(tmp.is_passenger())
{
// Passengers - use unhappy graph.
add_pax_unhappy(tmp.menge);
}
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

MCollett

Quote from: jamespetts on December 10, 2012, 08:31:07 PM
We are dealing with journey time tolerances here, not waiting times - unless that is what you meant?

Journey time tolerances in the game are an example of 'waiting times' in a stochastic sense, which was what I meant.  (The actual waiting times in the game are of course mostly deterministic.)  An exponential distribution results naturally if there is a constant probability per unit time that any given individual will stop waiting.

Best wishes,
Matthew

jamespetts

Quote from: MCollett on December 10, 2012, 10:53:56 PM
Journey time tolerances in the game are an example of 'waiting times' in a stochastic sense, which was what I meant.  (The actual waiting times in the game are of course mostly deterministic.)  An exponential distribution results naturally if there is a constant probability per unit time that any given individual will stop waiting.

Best wishes,
Matthew

Ahh, I see what you mean. This doesn't work for journey time tolerances, however, where the journey time and tolerance must be calculated fully in advance.

Edit: Incidentally, was this what you meant with your original post, or was that different? I am afraid that I am no mathematician...
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

MCollett

Quote from: jamespetts on December 10, 2012, 11:41:56 PM
This doesn't work for journey time tolerances, however, where the journey time and tolerance must be calculated fully in advance.

But the tolerance can be calculated in advance, by drawing from the exponential distribution, as I said in my first post in the thread.  An equal probability of giving up per unit time is a justification or motivation for using the distribution, not necessarily the way to do the calculation.

Best wishes,
Matthew

jamespetts

Ahh, I wasn't aware of that. I'm afraid that your understanding of mathematics is rather in advance of mine - would you mind explaining that in a little more detail such that a mathematically challenged mind such as mine might understand it? Thank you very much for your input - it is most appreciated.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

MCollett

Quote from: jamespetts on December 11, 2012, 12:17:22 AM
would you mind explaining that in a little more detail such that a mathematically challenged mind such as mine might understand it?
The general algorithm is presumably of the form:-

       
  • Generate a possible trip.
  • Based on the distance to be travelled and any other relevant parameters (the game year, the social status of the passenger, ...) determine the typical journey time tolerance T for that trip.
  • Draw the actual journey time tolerance t from a random distribution parameterised by T.
  • If t is larger than the expected journey time, make the trip, otherwise don't bother.
One straightforward way to implement an exponential distribution with mean time T for step 3 is:-

       
  • Generate a uniform random number x in the interval (0,1].
  • Calculate t = -T ln x .
This uses floating-point arithmetic.  If you want to use only integers, but are happy with an approximate stepped distribution, then:-

       
  • Generate a uniform random integer x in the interval [0,2N).
  • Find the number n of leading 1s in the binary representation of x; i.e. x consists of n 1s, a 0, and a remainder of N-n-1 other bits. Call the remainder y.
  • Calculate t = (n + y/2N-n)T .
In this case T is the median rather than the mean.

Best wishes,
Matthew

jamespetts

Matthew,

thank you for your suggestion - I shall bear that in mind.

In terms of the more general issue of calibration, I have looked further into the code this evening, and discovered why some buildings are reported at level 0, despite being set to a higher level in the .dat files: makeobj subtracts 1 from the level of each building at the time of the compilation of a pakset. It is not entirely clear why this is done, but it is at least a consistent, if somewhat confusing, mechanism.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

jamespetts

On the issue of calibration of the number of people in a town to the number of buildings to the density of the town/those buildings, the basic formula in the game for it can be found in simcity.h here:


    /**
     * ermittelt die Einwohnerzahl der Stadt
     * "determines the population of the city"
     * @author Hj. Malthaner
     */
    sint32 get_einwohner() const {return (buildings.get_sum_weight()*6)+((2*bev-arb-won)>>1);}


get_einwohner() ("get population") is the method that returns the city's population as shown in the city information windows.

"buildings" is a weighted vector of buildings in the city, the weights being their levels assigned in the .dat files (with 1 added to the level to prevent there being zeros). Two buildings with a level set in the .dat file of 1 (displayed as 0) would therefore give a sum_weight of 2; two buildings, one with a level of 0/1 and one with a level of 1/2 would give a sum weight of 3, and so forth.

As to "bev", "arb" and "won", they are defined in comments in simcity.h as follows:


// population statistics
    sint32 bev; // total population
    sint32 arb; // amount with jobs
    sint32 won; // amount with homes


The "bev" is particularly cryptic, as it purports itself to be the measure of population, but is used in the method used to return the population as part of a formula including other things. Only a little light is thrown on it by the following:


    uint32 get_buildings()  const { return buildings.get_count(); }
    sint32 get_unemployed() const { return bev - arb; }
    sint32 get_homeless()   const { return bev - won; }


The basic calculation seems to be: 6 times the sum of the weight of all city buildings plus the "bev" measure of population less "unemployed" and "homeless" (the *2 and <<1 seem to cancel each other out, and are probably used to avoid rounding errors).

The "bev" value is mainly incremented in units of 1 (aside from the special buttons reserved to the public player to increase or decrease this by 100) in the step_bau() method in simcity.cc. For existing towns, the formula is:


// since we use internally a finer value ...
    const int growth_step = (wachstum >> 4);
    wachstum &= 0x0F;

    // Hajo: let city grow in steps of 1
    // @author prissi: No growth without development
    for (int n = 0; n < growth_step; n++) {
        bev++; // Hajo: bevoelkerung wachsen lassen

        for (int i = 0; i < 30 && bev * 2 > won + arb + 100; i++) {
            baue(false);
        }


"wachstum" itself is set in the calc_growth() method, and is positive when passengers, mail, goods or electricity are supplied to the town (and greater in amount the greater proportion of these things that are transported).

For new towns, the formula is this:


bool new_town = (bev == 0);
    if (new_town) {
        bev = (wachstum >> 4);
        bool need_building = true;
        uint32 buildings_count = buildings.get_count();
        uint32 try_nr = 0;
        while (need_building && try_nr < 1000) {
            baue(false); // it update won
            if ( buildings_count != buildings.get_count() ) {
                if(buildings[buildings_count]->get_haustyp() == gebaeude_t::wohnung) {
                    need_building = false;
                }
            }
            try_nr++;
            buildings_count = buildings.get_count();
        }
        bev = 0;
    }


The elusive "arb" and "won" figures, meanwhile, are set in the baue_gebaude method as follows:


if (sum_gewerbe > sum_industrie  &&  sum_gewerbe > sum_wohnung) {
            h = hausbauer_t::get_gewerbe(0, current_month, cl, new_town);
            if (h != NULL) {
                arb += h->get_level() * 20;
            }
        }

        if (h == NULL  &&  sum_industrie > sum_gewerbe  &&  sum_industrie > sum_wohnung) {
            h = hausbauer_t::get_industrie(0, current_month, cl, new_town);
            if (h != NULL) {
                arb += h->get_level() * 20;
            }
        }

        if (h == NULL  &&  sum_wohnung > sum_industrie  &&  sum_wohnung > sum_gewerbe) {
            h = hausbauer_t::get_wohnhaus(0, current_month, cl, new_town);
            if (h != NULL) {
                // will be aligned next to a street
                won += h->get_level() * 10;
            }
        }


The odd thing to notice here is that there is no addition of 1 to the get_level method, so buildings with a level of 0 will not affect these figures. I am not sure whether this is intended, and suspect that it is not: I have posted a bug report about it.

At present, I remain somewhat confused about the relationship between bev, arb, won, the number and level of buildings in a city and the reported population figures, and how this system is intended to work. Any assistance in unpicking this that I might work out how population densities are actually calculated (and therefore how to calibrate them) would be much appreciated.

Meanwhile, I have found a useful web page on urban population densities in the largest 10 UK cities.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

MCollett

Quote from: jamespetts on December 20, 2012, 01:43:25 AM
On the issue of calibration of the number of people in a town to the number of buildings to the density of the town/those buildings, the basic formula in the game for it can be found in simcity.h here:


    sint32 get_einwohner() const {return (buildings.get_sum_weight()*6)+((2*bev-arb-won)>>1);}

I suspect that bev may be something like the number of households, rather than the number of individuals.  Householders with a home and a job have a family, contributing 6 (or maybe that is 3,  depending on the details of get_sum_weight()) to the total population.  Those that are unemployed and homeless contribute only 1 (themselves) to the total.  (Note that 2*bev-arb-won could be written as get_unemployed()+get_homeless().)

Best wishes,
Matthew


MCollett

Quote from: jamespetts on December 19, 2012, 02:53:18 AM
thank you for your suggestion - I shall bear that in mind.

Here's some real code:

const unsigned int uibits = std::numeric_limits<unsigned int>::digits;
const unsigned int lgbits = uibits-2;
const unsigned int lgbit = 1<<lgbits;
const unsigned int hibit = lgbit<<1;

unsigned int scaled_lg(unsigned int x, unsigned int T=1) {
if (x==0) return uibits*T;
unsigned int lg = 0;
//Find the first significant digit
while ((x & hibit) == 0) {
lg += T;
x <<= 1;
}
//Make space for overflow
x >>= 1;
//Find leading bits in mantissa
for (int j =1; j<lgbits/2; ++j) {
if ((x ^ lgbit) == 0 || T == 0) return lg;
x >>= lgbits/2;
x *= x;
unsigned int round = T & 1;
T >>= 1;
if (x & hibit) {
x >>= 1;
lg -= T+round;
}
}
//Interpolate remaining bits linearly
while (T != 0) {
if (x & lgbit) lg -= T;
T >>= 1;
x <<= 1;
}
return lg;
}

If you call this with a uniform random integer x, it will return an exponentially distributed one with median T.

Best wishes,
Matthew

jamespetts

Matthew,

thank you for both replies. The formula I shall look into when I reach that part of the exercise: I shall concentrate first on getting the right relationship between buildings, population, density and base passenger generation first, I think.

I am beginning to suspect that "arb" and "won" refer to the number of workplaces and homes respectively: "arb", I think, is short for the German "Arbeit" meaning "work", and "won" for the German "wohnen" meaning "live". "arb", then, I think, refers to the number of places of employment, and "won" to the number of homes. Quite how this relates to "bev" I am still not entirely sure.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

MCollett

Quote from: jamespetts on December 20, 2012, 11:55:37 AM
I am beginning to suspect that "arb" and "won" refer to the number of workplaces and homes respectively: "arb", I think, is short for the German "Arbeit" meaning "work", and "won" for the German "wohnen" meaning "live". "arb", then, I think, refers to the number of places of employment, and "won" to the number of homes.

I thought this was already quite clear from the comments in the code you posted, though arb is the number of jobs rather than of workplaces:
// population statistics
    sint32 bev; // total population
    sint32 arb; // amount with jobs
    sint32 won; // amount with homes


The comment on bev is clearly misleading, but those on arb and won make sense.  If bev is really the number of households, then 'full employment' (when get_unemployed() is zero) corresponds to one job per household, and 'sufficient housing' (when get_homeless() is zero) to one home per household.

Best wishes,
Matthew


jamespetts

I am not sure that that is correct about "bev", as this formula:


    sint32 get_einwohner() const {return (buildings.get_sum_weight()*6)+((2*bev-arb-won)>>1);}


has the effect that "arb" and "won" are accounted for indirectly through buildings (the buildings.get_sum_weight()*6), which is why they are subtracted from "bev", and "bev" is added to the buildings.get_sum_weight()*6 because it (less arb and won) is some sort of remainder value, perhaps representing people living/working beyond the designed capacity of the buildings. Oddly, the consequence of the way that the passenger generation works is that these people will not create any passenger traffic, as passenger traffic is generated only by buildings.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

MCollett

Quote from: jamespetts on December 21, 2012, 12:11:59 AM
I am not sure that that is correct about "bev", as this formula:


    sint32 get_einwohner() const {return (buildings.get_sum_weight()*6)+((2*bev-arb-won)>>1);}


has the effect that "arb" and "won" are accounted for indirectly through buildings (the buildings.get_sum_weight()*6), which is why they are subtracted from "bev", and "bev" is added to the buildings.get_sum_weight()*6 because it (less arb and won) is some sort of remainder value, perhaps representing people living/working beyond the designed capacity of the buildings.

Sort of.  Let me restate in a slightly different way what I said a couple of posts back: buildings.get_sum_weight() is something like the number of people (or more precisely, if my interpretation is correct, the number of householders) with a home and a job; it should be equivalent to some_weighting_factor*arb +some_other_weighting_factor*won.   Each of them apparently contributes not 1 but 6 to the total population (i.e. each has an average of 5 dependents). 

If the normalisation were fully consistent, some_weighting_factor+some_other_weighting_factor would be equal to 1, but I doubt that is actually the case.  If it isn't, then the apparent 6 inhabitants per household is actually more or less by whatever factor is required to fix the normalisation.

(2*bev-arb-won)>>1, or equivalently (get_unemployed()+get_homeless())/2, is the number of remaining householders who have no home or job.  Such people only contribute 1 to the total population; evidently they are presumed to have no dependents.

Best wishes,
Matthew

jamespetts

#58
Hmm, I don't think that that's right, because buildings.get_sum_weight() returns the total number of buildings multiplied by the level* of those buildings. Those buildings include commercial and industrial buildings as well as residential buildings.

* As noted above, what the "level" is gets complicated. 1 is subtracted from the value of the "level" set in the .dat file, and added back in in some places but not others. It is added when the weights of buildings are set in the "buildings" vector, so something with a level of 1 in the .dat file would be 1 here, rather than 0. This means that all buildings count for this purpose.

Let me try running a worked example to see how these various things pan out. Suppose that we have an imaginary city (hamlet) with two buildings: one residential building of level 1, and one commercial building of level 2. (I am referring here to the "level" as set in the .dat file, not as it appears - as will be seen, for some purposes, 1 is subtracted from this number).

The residential building will actually contribute nothing to "won", since it is recorded as being level 0, so "won" will be 0. The commercial building will contribute its level * 20 to "arb", so "arb" will be 20 and "won" will be 0. We do not know what "bev" will be, since "bev" determines (indirectly) rather than is determined by the number of buildings. I will assume "bev" to be 20 for now. Running the formula, we would get:

buildings.get_sum_weight() = 3
3 * 6 = 18
bev - arb - won = 0
0 + 18 = 18

If I am right in my suspicions that the failure to add 1 to the weight of buildings when setting arb and won is a bug, then arb would end up being 40 and won would end up being 10, and the correct result would be:

buildings.get_sum_weight() = 3
3 * 6 = 18
bev - arb - won = -30
-30 + 18 = -12

That would give a negative number, which is almost certainly not intended; however "bev" was an assumed value. If we assume that "bev" is 50 instead of 20, we get:

buildings.get_sum_weight() = 3
3 * 6 = 18
bev - arb - won = 0
0 + 18 = 18

In either case, "bev" even unadjusted does not equal buildings.get_sum_weight()*6, although it is closer in the first instance.

In any event, the general intention is for the level of each building to be multiplied by 6 to get the base population: the ((2*bev-arb-won)>>1) formula appears to be only a minor adjustment. We can probably, for the purposes of the calibration of the passenger factor, ignore that part and concentrate on the basic fact, in Simutrans, population approximately equals the number of buildings multiplied by the (unadjusted) level of each building multiplied by 6.

The formula for generating passengers/mail is thus:


// prissi: since now backtravels occur, we damp the numbers a little
   const int num_pax =
      (wtyp == warenbauer_t::passagiere) ?
         (gb->get_tile()->get_besch()->get_level()      + 6) >> 2 :
         (gb->get_tile()->get_besch()->get_post_level() + >> 3 ;


This is not the full picture, however, because non-local passengers (that is, passengers going somewhere other than in their own city) will also generate a return trip. This makes the number of passengers generated by each building somewhat non-constant in proportion to other factors, since altering the proportion of passengers to increase local trips will decrease the number of returns, and therefore reduce the overall number of passengers generated. This might need looking into (perhaps a change in the code so that all passengers return).

In any event, the basic formula is that 3/4 of all packets are passengers, and the number of passengers in the packet is determined on the formula building level (unadjusted) + 6 / 4. Both our level 1 and level 2 buildings in the above example would thus produce 1 passenger per packet.

The next important piece in the jigsaw puzzle for passenger generation rate is the number of times per game month that the code for generating passengers is called. As can be seen in the spreadsheet attached to the opening post of this thread, for a passenger factor of 8 and a bits per month setting of 18, each building in a town is stepped once per game "month". For a passenger factor of 8 and a bits per month setting of 21 (yielding 6.4 hours per month), this entails the stepping of each city building 0.9375 times per month.

Remembering that the average person makes 1,100 trips per year, we need to find a formulation that properly encapsulates the correct relationship between this and the passenger generation figures. It is necessary, however, to use a figure of greater than 1,100 as a base, as account must be taken of the fact that not all generated passenger packets in Simutrans will actually make journeys, even in a well-connected game, as there is the journey time tolerance to consider. I shall aim for 1,350 as the base figure.

First, the yearly figure must be translated to an hourly figure. For reasons discussed elsewhere, I use a figure for "active hours" in a day as being 16 (24 - 8 hours' sleep). 365.25 * 16 = 5,844 active hours per year. 1,350 / 5,844 = 0.231. Each unit of population should therefore generate 0.231 passengers per hour.

Because of the inconstancy in the treatment of return journeys and the mapping of levels to passenger generation discussed above, the current code does not produce a stable number of potential passenger journeys per unit of population. If we assume for the moment (density will be covered later) that all city buildings are of level 1 and 2/3rds of passengers will generate return trips, then we can make some approximate calculations.

It turns out that a passenger factor of 2 produces 0.234 units of passengers per hour for each level 1 building. Each level 1 building will register a population of 6; however, approximately 2/3rds of journeys are return journeys. Multiplying 6 by 2 we get 12, which we then have to divide by 2/3rds, which yields 8. If we changed the formula to make all journeys result in returns (and, after all, how often do people really go anywhere without also coming back eventually?), that number would be 4.

So, for the moment, 8 is the ideal passenger factor; 4 would be the ideal passenger factor if all journeys were return journeys.

However, as discussed in the opening post, there is a non-constant relationship between level and the number of generated passengers because of the odd formula used. This will produce inconsistent results. A change of formula is needed here. Suppose for a moment we were to simplify the arrangements above, and, instead of adding six and dividing by four, we just added 1 (to compensate for subtracting one in makeobj, so that a level 1 building was treated as being of true level 1). What would that produce?

For a level 1 building, this would not change anything, as it happens, as 1 + 6 = 7; 7 / 4 = 1.75, but, because only integers are used here, not floating point numbers, this would be rounded down to 1 in the code in any event. What it would do, however, is create a linear relationship between the level of buildings and the number of passengers generated, such that a level 2 building would generate 2x the number of passengers of a level 1 building, a level 3 building 3x as many and so forth. This, I think, would be more satisfactory, although it would mean an increase in the number of base passengers generated for any given passenger factor from previous versions. Nonetheless, this sort of precise relationship is necessary for the purposes of accurate density calculations.

Turning, then, to density, the one point that is immediately apparent is that, because nothing on population density has been changed so far between Standard and Experimental, there is no adjustment in population density based on the distance scale. This is unfortunate and needs to be remedied.

I will start with the assumption that each tile is 125m x 125m (as will be the default in the next version of Pak128.Britain-Ex). That means that each tile is 15,625 square meters or 0.015625 square kilometres. The average population density per square kilometre for large urban areas in the UK is 4,100 (source). 4,100 * 0.015625 = 64.0625; each city tile ought therefore account for a population of about 64, at least in a dense city. This figure holds for even smaller urban areas in the South East of England, such as Slough. Smaller towns (and towns earlier in history) would have a lower population density, but I cannot currently find reliable figures for this.

In a Simutrans city, it seems to be a reasonable assumption that 50% of the land area is covered with city buildings, the other 50% being taken by roads, railways, stations, open spaces, etc.. This means that the above figure needs to be doubled for buildings, to 128 head of population per tile, on average, for a dense town. If the current system prevails, that would mean an average building level of 21.33 for each tile in a densely populated town. (For reference, at 250m/tile, 0.0625 km^2 / tile, there would need to be 256.25 head of population per tile, or 512.5 head of population per built tile, giving an average level of 85.41).

These levels are considerably higher than are customary in Simutrans, so some consideration is needed of whether to adopt an alternative formulation in the code. Low density areas, which should have perhaps 25-100 dwellings per square kilometre (source) are best represented by level "1" buildings in Simutrans. Assuming a figure of 50 dwellings per square kilometre and an average population of 3 persons per dwelling in this low density state of affairs, this will produce 150 head of population per square kilometre for level 1 buildings, or 2.34375 head of population per tile for level 1 buildings at 125m/tile. Taking into account the less than 100% utilisation of land for buildings, the code ought to produce 3-4 units of population per level 1 building tile at 125m/tile (suggesting that much higher levels of buildings really are needed, and that there is far too little variation of population density in Simutrans cities: in Pak128.Britain, for example, a medium rise tower block is level 3 wheras a single family home is level 1, yet the tower block can contain far, far more people than 3 family homes). This would double to 8 at 250 meters, 16 at 500 meters and 32 at 1km per tile.

The base formula for population, therefore, ought to be: (buildings.get_sum_weight() * welt->get_settings().get_meters_per_tile()) / 31, subject to the adjustment with bev, arb and won (if, on further consideration, this is still necessary).

This will then require the reconsideration of the passenger numbers calculations above, as they will be out of step with the revised population figures in light of this density calibration. Firstly, the passenger factor will need adjusting to take into account the meters per tile setting: the higher the meters per tile setting, the greater that the passenger factor has to be. Starting with the correct value for 125m/tile, a passenger factor of 2 gets 0.234 passengers per hour per level 1 building. At 125m/tile, a level 1 building would produce 125/31 = 4 head of population, giving a figure of 8 as the ideal passenger factor. At 250m/tile, however, that ideal passenger factor rises to 16; at 500m/tile to 32 and at 1km/tile to 64. On balance, I think that it is best not to incorporate this change directly into the code, but rather add to the comments in simuconf.tab the explanation of these various passenger factor numbers and let people who adjust the tile density set the figures themselves.

I should further note that these ideal figures will again need reconsideration if the system of mail generation is to change, which will have to be the subject of a different analysis later in time, as I am already late for Christmas shopping.

Additionally, furhter consideration will have to be given to whether the town growth formula as it currently stands is compatible with the more realistic population model proposed here, and will work with the greater variations in density.




All of these calculations suggest that, as previously suspected, the population density and the base numbers of passengers generated with the passenger factor currently in use in Pak128.Britain-0.8.4 (currently in use on the Bridgewater-Brunel server) are both too low. On the face of it, that is at odds with the apparently excessive numbers of passengers being generated, but this appears to be explained by the problems with the formulation of the journey time tolerance code: in the past, far fewer people travelled because journeys took much longer. Long distance journeys in particular need to be curbed by this method.

One major change needed as a result of this is to pakset design, greatly to increase the relative level of high density buildings to low density buildings, on the basis that a level 1 building represents a single household (or, perhaps, in commercial terms, a single small shop or micro-office), and that these must scale in a linear fashion as the density represented increases.

This also raises the possibility that, in urban areas at least, there might not be enough room for a transport network that can realistically as many people as are actually generated because of the foreshortened relationship between actual buildings/tiles of road and the size that they represent. This is probably more of a problem with higher numbers of meters per tile. According to this source, the mean number of 'bus stops per square mile in one US city is 76.158, the maximum being 466. 1 mile approx. equals 1.6km. On a scale of 125m/tile, one can fit 163.84 'bus stops into a square mile (in extreme, assuming nothing but 'bus stops; a sensible figure would be much lower); at 250m/tile, that is reduced to only 40.96; at 500m/tile to 10.24 and at 1km/tile to 2.56. Even with to scale coverage areas, this will not assist much, as the actual number of 'buses able to depart from a single 'bus stop will be the same.

This, in turn, suggests that, at higher values of meters per tile, it becomes increasingly impossible to simulate realistic patterns of urban density and local transport, making the move in Pak128.Britain-Ex from 250m/tile to 125m/tile of particular significance. This ought not affect long distance transport, however, as fitting in enough infrastructure in that case is far less significant because of the much lower number of passengers that will use it if this is properly calibrated, and the much higher relative amount of space that it is able to occupy.

Edit: One possible way of dealing with this, actually, would be to remove from simulation all of the very short passenger journeys, and define "very short" differently depending on the tile scale. According to this publication (see page 2 for the pie chart), 22% of all trips are under 1 mile, and a further 19% of trips are between 1 and 2 miles.

In Simutrans-Experimental, we could work on the basis that we simply do not simulate trips of under 1 mile (1.6km) at all, and do not simulate trips of under 2 miles (3.2km) where the meters per tile setting is above perhaps either 250m or 500m (furhter consideration would be needed of that threshold).

We could then adjust the figures by reducing by either 22% or 22+19 = 41% the total number of annual trips per person (from a nominal 1,350 to a nominal 1,188 for lower values of meters per tile or to a nominal 796.5 for higher values of meters per tile), and recalibrating the local/midrange/long distance passengers as follows:

Low values of meters per tile
Local: 62 (79%)
Midrange: 14 (18%)
Long distance: 2 (3%)

High values of meters per tile
Local: 43 (73%)
Midrange: 14 (24%)
Long distance: 2 (3%)

In practice, I also add an overlap between the distance ranges of midrange and long distance, reduce the mid-range percentage and increase the long-distance percentage, so an adjusted set of figures for Pak128.Britain-Ex 0.9.0 might look like this:

Adjusted figures for low values of meters per tile
Local: 79%
Midrange: 16%
Long distance: 5%

I should be interested in views on whether and to what extent this makes sense in the simulation context.




This will form the basis of a test passenger calibration branch of my Github repository to look into these ideas when I have a chance. I shall also produce a test branch of my Pak128.Britain-Ex Github repository to model different building densities.

In the meantime, I should be very interested in any feedback on these discussions, particularly if anyone has spotted any errors in my formulae.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

ӔO

okay, after finally getting around to reading it, I do think the move to 125m/tile would be good.

However, if you move to 125m/tile, then it should be possible to simulate trips under 1 mile with station coverage of 3. With 3, you have a 875m square for each bus stop and 1000m square for two connected bus stops. The distance between each stop, if left seamless, would be approximately 1000m.

If you move up to station coverage 4, then you get a 1125m square or 1250m square and you could still include trips under 1.6km.

These distances are still twice the length of the average bus stop distance of 500m between each stop. They are, however, within the longer distances of 1km between each stop.
My Sketchup open project sources
various projects rolled up: http://dl.dropbox.com/u/17111233/Roll_up.rar

Colour safe chart:

jamespetts

Yes, I think that you're right that we can't leave out the smaller distance journeys. Do you think that we should halve the number of journeys under 1 mile to compensate for having less infrastructure density because of the scaling, or leave things with fully real figures?
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

ӔO

seeing as the bus stops might get overwhelmed, I would compensate, or not simulate under 900m at all.
My Sketchup open project sources
various projects rolled up: http://dl.dropbox.com/u/17111233/Roll_up.rar

Colour safe chart:

jamespetts

Quote from: ӔO on December 24, 2012, 06:35:07 PM
seeing as the bus stops might get overwhelmed, I would compensate, or not simulate under 900m at all.

The statistics that I have don't distinguish between trips of 900m and trips of under 1.6km (1 mile). I'd have to re-do all the calculations, removing half the passengers from the under 1 mile category and thus adjusting both the total number of passengers and the percentages to the various distances accordingly.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

ӔO

In that case, I would just do half for under 1.6km. Whichever is simpler.

Seeing as the results are not fully known, until they are played out in the game, I would use the simpler solution to test the results. If they work, great, if they don't, hopefully not much time was used.
My Sketchup open project sources
various projects rolled up: http://dl.dropbox.com/u/17111233/Roll_up.rar

Colour safe chart:

greenling

Hello Jamespetts i can sone post a photo from a old timetable.
Opening hours 20:00 - 23:00
(In Night from friday on saturday and saturday on sunday it possibly that i be keep longer in Forum.)
I am The Assistant from Pakfilearcheologist!
Working on a big Problem!

jamespetts

Greenling,

that's kind of you to suggest. I don't think that it's quite what we need for this, but by all means post them if you would like: they might come in handy at some point.

AEO,

I think that I might try, with the 125m/tile, using the fully realistic numbers without adaptation and seeing how that pans out.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

asaphxiix

Quote from: jamespetts on December 10, 2012, 10:22:11 AM
Sdog,

I have considered in the past using costing as a factor for transport, but rejected it on the ground that it would introduce undesirable complexities or, if the complexities were to be glossed over or simplified away, perverse results/incentives.

The above figures show that the journey time tolerance feature really does need to be modified in order to produce a better set of numbers, I think.

(Incidentally, the private car system could not be adapted as you imagine, as it involves checking whether the passenger has a private car, then checking whether the journey can be completed with a private car, then checking whether it can be completed with public transport, then comparing the merits of the two modes).

Kieron,

Hmm, I did implement more or less that mechanism:


/* Generates a random number on [0,max-1] interval with a normal distribution*/
#ifdef DEBUG_SIMRAND_CALLS
uint32 simrand_normal(const uint32 max, const char* caller)
#else
uint32 simrand_normal(const uint32 max, const char*)
#endif
{
   const uint32 half_max = max / 2;
#ifdef DEBUG_SIMRAND_CALLS
   return (simrand(half_max, caller) + simrand(half_max, "simrand_normal"));
#else
   return (simrand(half_max, "simrand_normal") + simrand(half_max, "simrand_normal"));
#endif
}


but I think what is needed is not a normal distribution at all, but a declining probability the higher that the numbers get: after all, the median of 30 and 4320 minutes, the long distance range, is 2,190 minutes, or 36.5 hours! Having this as the most common journey time tolerance for long distance journeys would not work. What I really need is having, say, three or two and a half hours as the most common journey time tolernace for long distance passengers, with a significant minority getting tolerances between 30 minutes and 3/2.5 hours, and a dwindling minority getting journey time tolerances above that in the stratospheric ranges.

Perhaps it would do to introduce a median value for time tolerance, similar to that of the speedbonus max distance, in such a way that if we set, say, min/med/max 500/1000/3000, then half the journeys will have tolerance 500-1000, and the other half 1000-3000? etc.

jamespetts

#67
Quote from: asaphxiix on January 08, 2013, 09:52:22 AM
Perhaps it would do to introduce a median value for time tolerance, similar to that of the speedbonus max distance, in such a way that if we set, say, min/med/max 500/1000/3000, then half the journeys will have tolerance 500-1000, and the other half 1000-3000? etc.

I am currently minded to try out Kieron's later suggestion of multiplying the numbers to get a sort of exponential skew; however, I need to get the basic calibration of raw passenger generation right before I do further work on the tolerances.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

jamespetts

It has occurred to me that some adjustment is necessary to the base generation figures set out above. Earlier, I wrote,

Quote
Remembering that the average person makes 1,100 trips per year, we need to find a formulation that properly encapsulates the correct relationship between this and the passenger generation figures. It is necessary, however, to use a figure of greater than 1,100 as a base, as account must be taken of the fact that not all generated passenger packets in Simutrans will actually make journeys, even in a well-connected game, as there is the journey time tolerance to consider. I shall aim for 1,350 as the base figure.

First, the yearly figure must be translated to an hourly figure. For reasons discussed elsewhere, I use a figure for "active hours" in a day as being 16 (24 - 8 hours' sleep). 365.25 * 16 = 5,844 active hours per year. 1,350 / 5,844 = 0.231. Each unit of population should therefore generate 0.231 passengers per hour.

However, this makes the error of squashing the number of trips made by passengers in a total of 24 hours into 16. What we should actually do is generate that proportion of all passenger trips that take place during the 16 busiest hours of the day. Data for trips per time of day can be obtained here.

The quietest 8 hours are 2300 - 0700, in which 3.67% of all trips are made. Therefore, the total number of passengers generated needs to be reduced by 3.67% - taking the base as 1,350 results in an adjusted base of 1,300 - or, if the base were taken instead as 1,250 (which on reflection might be rather better than a base of 1,350), this would give 1,205. This would give 1,300 / 5,844 = 0.222 or 1,205 / 5,844 = 0.206 instead of the previously calculated 0.231. The overall difference that this might make may well be small, however, but the above figures might need adjusting in consequence.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

jamespetts

I have been doing some work to implement the most basic parts of this (the relationship between the building's level, its size, the population in cities and the overall number of passengers generated) in advance of the next version. I was not originally going to implement any of this, but one of the things on my to do list for the next pakset release was recalibrating the congestion settings, which I came to realise was not possible to do properly without making some sort of start on this.

According to the above figures, we are aiming for a generated passenger to population ratio of 1.22:1 or thereabouts. I have changed the code so that all passengers generate return trips, and removed the code "damping" the amount of passengers and mail, which also had the effect that the different levels of buildings did not cause a linear increase in the number of passengers as would be expected. I have ensured that the level of buildings as entered in their .dat files are now directly represented in the game without intermediate obfuscation involving the passenger factor as occurred before, and recalibrated the passenger generation to interlink with the meters per tile setting.

I have further added a new system for calculating congestion (the old system can still be used: the congestion density factor setting determines which system is used), based on calculations derived from the TomTom Congestion Index. The desired ratio of 1.22:1 (to the nearest 2 decimal places) can be obtained with a passenger factor of 15.

While I am about it, I have also discovered and fixed a number of bugs with private car generation, including a bug that caused the number of recorded private car trips to be considerably too high.

A problem remains, however, that will need the more detailed review of systems that I cannot achieve in time for the next release, which is this: because the current system of destination finding requires that destinations be within a certain fixed distance range, small towns near large towns become the destination for a disproportionate number of passengers, greatly increasing the congestion in those towns, even if there is nothing much of interest there to which passengers might want to travel. This should be addressed with the planned more fundamental overhaul of town growth and passenger destination finding.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.