The International Simutrans Forum

 

Author Topic: Calibrating the passenger factor  (Read 29901 times)

0 Members and 1 Guest are viewing this topic.

Offline Carl

  • Devotee
  • *
  • Posts: 1595
    • Website
  • Languages: EN
Re: Calibrating the passenger factor
« Reply #70 on: May 17, 2013, 07:42:22 AM »
Hi James, that sounds very interesting. (And as a side note, I'm happy to hear that the cause of small towns getting disproportionate passengers has been identified.)

I have a very simplistic question to ask. Holding the passenger_factor value fixed, do you expect that the changes you describe above will increase or decrease the typical volume of passengers?

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 18696
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Calibrating the passenger factor
« Reply #71 on: May 17, 2013, 08:22:02 AM »
If you mean for a given passenger factor, will the number of passengers generated increase or decrease, the answer is that they will decrease.
« Last Edit: June 19, 2013, 10:01:52 AM by jamespetts »

Offline Carl

  • Devotee
  • *
  • Posts: 1595
    • Website
  • Languages: EN
Re: Calibrating the passenger factor
« Reply #72 on: May 17, 2013, 10:46:57 AM »
Thanks for the reply, James: that's what I was hoping you'd say. ;)

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 18696
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Calibrating the passenger factor
« Reply #73 on: May 17, 2013, 08:53:01 PM »
Splendid!

Offline neroden

  • Devotees (Inactive)
  • *
  • Posts: 831
  • Nathanael Nerode
Re: Calibrating the passenger factor
« Reply #74 on: June 24, 2013, 06:38:22 AM »
On the other hand, London Underground services tend not to be as busy as one would expect -- except in the very centre of London. This is presumably because it's quite difficult to simulate the fact that Londoners are much more likely to use public transport.
I think the biggest problem is probably city growth patterns.  There is no mechanism to make cities grow more aggressively near places with public transportation, which is the mechanism which made London so dense, which is what made its public transportation so successful...  I would really like to implement something like that, so that when the city looks to add new buildings, it preferentially goes near well-served public transport stops, but I haven't really figured out how to do it yet.  It would require changing the new-building algorithm entirely. "renovate_percentage" should not exist; there should be some sort of weighted vector of locations, with a penalty for already having a building, but a bonus for having better public transport.

Offline neroden

  • Devotees (Inactive)
  • *
  • Posts: 831
  • Nathanael Nerode
Re: Calibrating the passenger factor
« Reply #75 on: June 24, 2013, 06:53:20 AM »
A problem remains, however, that will need the more detailed review of systems that I cannot achieve in time for the next release, which is this: because the current system of destination finding requires that destinations be within a certain fixed distance range, small towns near large towns become the destination for a disproportionate number of passengers, greatly increasing the congestion in those towns, even if there is nothing much of interest there to which passengers might want to travel. This should be addressed with the planned more fundamental overhaul of town growth and passenger destination finding.
In practice, it's totally realistic that small towns near large towns get big transportation and population booms -- as "commuter belt" towns.  One problem is the failure of the towns to densify properly, as mentioned above.

A second problem is the failure of the passenger destination code to consider industrial/commercial/residential in any sensible manner.  Passengers start at any building and go to any building. I think we should take a page from SimCity: most passengers should start at residential buildings, go to commercial, industrial, or tourist buildings (a fixed percentage for each) and then come back.

Factories should be counted in industrial buildings for this purpose and should have a fixed employment level (unlike what is currently implemented in standard, which is simply bizarre: currently factories employ more people if they are near larger cities, which is weird and makes no sense).

A certain percentage of passengers should, of course, visit other residential buildings.

Finally, while most passengers should go "there and back", a small percentage should randomly go to a new destination before going home.

I think this system would probably be fairly straightfoward to implement (for the next version).  Certainly easier than fixing the city growth problem.  The "second trip" part would be the most work as it would require keeping track of the ultimate builidng of origin.  Actually, the building of origin is worth keeping track of anyway, because we can't send passengers home properly at the moment.

Here are my first wild guesses for appropriate percentages:
Short-range passengers: 40% industrial, 40% commercial, 10% tourist, 10% residential, 5% "second trip"
Medium-range passengers: 20% industrial, 40% commercial, 20% tourist, 20% residential, 10% "second trip"
Long-range passengers: 10% industrial, 35% commercial, 35% tourist, 20% residential, 25% "second trip"

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 18696
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Calibrating the passenger factor
« Reply #76 on: June 24, 2013, 10:03:58 AM »
I have actually been planning to implement more or less exactly this (although I had not considered percentages, which should probably be set in the .dat file) for some time: see the earlier discussion in, I think, this thread. Indeed, I had devised, in the abstract at least, a way to deal with the issue of town growth being dependent on public transport, taking another leaf from Sim City's book and using success rates, which you will see implemented (currently for information only) in the latest development branches on Github (and, indeed, in all the recent release candidates). The higher the success rate of transport from a particular building, the more likely that it will be to upgrade, and the higher the success rates of surrounding buildings, the more likely that a new building will be built on an undeveloped tile. The idea is also to separate commuter and non-commuter traffic (replacing the current three distance based classifications), and have different sorts of traffic important for different sorts of growth: industry will need workers and does not care much for visitors, so will care only about commuter traffic; people move to where jobs are primarily, so residential will care mainly about commuter traffic, but will have some preference for better non-commuter success rates; commercial premises need visitors (whether they be customers or business visitors), so the number of non-commuter passengers will be much more important for commercial buildings. (Also, end-consumer "industries" should be classified as commercial for these purposes; player buildings should also be classified as industrial, as people have to work in stations and depots, too).

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 18696
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Calibrating the passenger factor
« Reply #77 on: April 02, 2014, 12:06:59 AM »
Apologies for reviving a somewhat elderly topic - my current work in testing the passenger generation code has lead me to look into one of the topics discussed in this thread again, being normal distributions.

Some time ago, Kieron very helpfully suggested a simplified means of calculating a normal distribution:

Code: [Select]
int biased_random(int max) {
  int random1 = simrand(max);
  int random2 = simrand(max);
  return (random1*random2)/max;
}

I implemented a version of that some time ago, but testing to-day showed that, for visiting trips in which passengers might be inclined to travel for up to 24 hours or more, this was not sufficient to make extremely long trips rare enough. I therefore turned to Kieron's enhanced suggestion:

Quote from:
kierongreen
The peak value will be 1/4 the maximum value. Though actually you are better off with (((rand(max)+1)*(rand(max)+1))/max)-1 to avoid a large number of 0s being generated. If you multiply 3 random numbers from 1 to max then divide by max*max the peak will be at 1/9 and so on.

I implemented it thus to allow flexibility:

Code: [Select]
#ifdef DEBUG_SIMRAND_CALLS
uint32 simrand_normal(const uint32 max, uint32 exponent, const char* caller)
#else
uint32 simrand_normal(const uint32 max, uint32 exponent, const char*)
#endif
{
#ifdef DEBUG_SIMRAND_CALLS
   sint64 random_number = simrand(max, caller);
#else
   uint64 random_number = simrand(max, "simrand_normal");
#endif
   assert(exponent <= 3); // Any higher number will produce integer overflows even with unsigned. 64-bit integers
   if(exponent < 2)
   {
      // Exponents of 1 make this identical to the normal random number generator.
      return random_number;
   }

   uint64 adj_max = max == 0 ? 1 : max;

   for(int i = 0; i < exponent - 1; i++)
   {
      random_number *= simrand(max, "simrand_normal");
   }

   for(int n = 0; n < exponent - 2; n ++)
   {
      adj_max *= adj_max;
   }

   return (uint32)(random_number / adj_max);
}

Testing it carefully, it seems to have the same effect as Kieron intended.

Even this I found insufficient, however: testing in 1775, excessive numbers of passengers (proportionately) are happy to use a stage coach service with journey times ranging between 6 and about 26 hours. Checking using a debugger reveals that a significant proportion of passengers are still happy to make journeys of many hours: far too high a proportion.

(I should note that the new passenger generation code splits visiting and commuting trips - the journey time tolerance ranges for each can be set separately in the pakset: for commuting trips, I have set a much narrower range of journey time tolerances, from 20 minutes to 2 hours, than for visiting trips; it is visiting trips that are causing the trouble here).

I have looked at Matthew Collett's earlier suggestion again, which I had not had time to look into before, and then had not considered necessary unless Kieron's simpler (and therefore more computationally economical, as well as easier to code) method proved to be inadequate, and it is only now that I am finding that, for visiting passengers at least, it seems to be.

Matthew's suggestion was as follows:

The general algorithm is presumably of the form:-
  • Generate a possible trip.
  • Based on the distance to be travelled and any other relevant parameters (the game year, the social status of the passenger, ...) determine the typical journey time tolerance T for that trip.
  • Draw the actual journey time tolerance t from a random distribution parameterised by T.
  • If t is larger than the expected journey time, make the trip, otherwise don't bother.
One straightforward way to implement an exponential distribution with mean time T for step 3 is:-
  • Generate a uniform random number x in the interval (0,1].
  • Calculate t = -T ln x .
This uses floating-point arithmetic.  If you want to use only integers, but are happy with an approximate stepped distribution, then:-
  • Generate a uniform random integer x in the interval [0,2N).
  • Find the number n of leading 1s in the binary representation of x; i.e. x consists of n 1s, a 0, and a remainder of N-n-1 other bits. Call the remainder y.
  • Calculate t = (n + y/2N-n)T .
In this case T is the median rather than the mean.

Best wishes,
Matthew

I am afraid, however, that I finding this somewhat hard to follow, as I am not familiar with the advanced mathematical concepts being deployed here. What does "Generate a uniform random integer x in the interval [0,2N)" mean exactly (especially the part of the sentence after "in the interval...")?

Any assistance in deciphering Matthew's helpful suggestion from some time ago, or any alternative means of achieving this objective would be most welcome. I am afraid that I have found the responses that I get to a Google search for "normal distribution" to be a little too techincal for me to understand and also focussed in any event on an unbiased normal distribution, whereas I am after a heavily biased distribution.

Edit: It looks like what I am trying to achieve is a positively skewed normal distribution between the minimum and maximum times - but how to go about writing a C++ function to make this work, let alone one with good enough performance to be called very frequently, is beyond me without some assistance, I think.
« Last Edit: April 02, 2014, 12:21:17 AM by jamespetts »

Offline ӔO

  • Devotees (Inactive)
  • *
  • Posts: 2345
  • Hopefully helpful
  • Languages: en, jp
Re: Calibrating the passenger factor
« Reply #78 on: April 02, 2014, 02:00:46 AM »
someone correct me if I'm wrong, but I think you could get a similar function by transforming the function to the left.
which should be as simple as subtracting a specific number from the result.

Offline scamps

  • *
  • Posts: 33
  • Languages: EN, RU
Re: Calibrating the passenger factor
« Reply #79 on: April 02, 2014, 03:08:09 AM »
what google shows: http://en.wikipedia.org/wiki/Normal_distribution#Generating_values_from_normal_distribution
Some methods don't seem too computionary intensive

Box-Muller with example of coding and comments http://en.literateprograms.org/Box-Muller_transform_%28C%29

Offline MCollett

  • *
  • Posts: 214
  • Languages: en
Re: Calibrating the passenger factor
« Reply #80 on: April 02, 2014, 03:09:15 AM »
What does "Generate a uniform random integer x in the interval [0,2N)" mean exactly
An integer with an equal chance (1 in 2N) of being any of  0, 1, 2, 3, … 2N-2,  2N-1.

Quote
I am afraid that I have found the responses that I get to a Google search for "normal distribution" to be a little too techincal for me to understand and also focussed in any event on an unbiased normal distribution, whereas I am after a heavily biased distribution.

Edit: It looks like what I am trying to achieve is a positively skewed normal distribution
The suggested algorithm (the integer version of which can probably be improved further*) does not generate a normal distribution or anything related to it: it gives an exponential distribution, which I am fairly confident is a better starting point for this application.

Best wishes,
Matthew

Edit: * in fact, was improved in the code I posted to the thread shortly after the message you quote.
« Last Edit: April 02, 2014, 03:22:05 AM by MCollett »

Offline Sarlock

  • Devotee
  • *
  • Posts: 1340
  • Languages: EN
Re: Calibrating the passenger factor
« Reply #81 on: April 02, 2014, 05:11:38 AM »
There should be plenty of computationally cheap methods to approximate the effect you want... I think the key starting point is to set a few data points so that a simple method can be devised.  On a large map this would be called a lot, so keeping it simple is certainly important.

To achieve half of the bell curve graph, you'd need, in theory, a Gaussian function (modified to stretch out the tail).  But that's pretty computationally heavy.  I think it can be easily represented by two exponential functions, each taking care of the commuting portion and the travel portion of the curve.

EDIT: Doing some more reading.  Did you try cubing the function as Kieron suggested?  That will get you a bias of 1/9 (11%) instead of 1/4 (25%), which pushes the curve far to the left and heavily diminishes it as you approach the maximum journey time.

Code: [Select]
int biased_random(int max) {
  int random1 = simrand(max);
  int random2 = simrand(max);
  int random3 = simrand(max);
  return (random1*random2*random3)/(max*max);   (*Note)
}
Note*You could rework the math so that you don't have to have a large integer to hold this value.

EDIT 2: Did some plotting on a spreadsheet - a cubed function will probably get it much closer to what you want for distribution at the high end.  A squared function yields about 16% of its values over 50% while a cubed function is down to 3% over 50%.  4% are over 75% with a square, 0.5% are over 75% with a cube.  This is probably ideal for the long distance journies and perhaps medium.  Local journies are probably better squared so that you get more journies at the higher end.

EDIT 3: How is longdistance_passengers_max_distance = 16384 used in the calculation?  Is it truncated to the maximum possible journey distance for that particular map and scale?  Per the simuconf comments, this is in km, correct?
From the online game, there are a lot of passengers generated who are happy to travel the entire length of the eastern island, some 200km in distance (10-15 hours).  Beyond that, passenger volumes drop - hardly anyone (basically zero) will attempt the 60+ hour journey between the east and west island on a 15 km/h ship.  I have a few long distance connections but it's almost all mail and very few passengers.
« Last Edit: April 02, 2014, 06:57:10 AM by Sarlock »

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 18696
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Calibrating the passenger factor
« Reply #82 on: April 02, 2014, 10:28:31 AM »
Thank you all for your replies - they are much appreciated. Matthew - I did not spot the code when I read it before, but I think that I see it now: apologies for missing it. Is it this:

Code: [Select]
const unsigned int uibits = std::numeric_limits<unsigned int>::digits;
const unsigned int lgbits = uibits-2;
const unsigned int lgbit = 1<<lgbits;
const unsigned int hibit = lgbit<<1;

unsigned int scaled_lg(unsigned int x, unsigned int T=1) {
if (x==0) return uibits*T;
unsigned int lg = 0;
//Find the first significant digit
while ((x & hibit) == 0) {
lg += T;
x <<= 1;
}
//Make space for overflow
x >>= 1;
//Find leading bits in mantissa
for (int j =1; j<lgbits/2; ++j) {
if ((x ^ lgbit) == 0 || T == 0) return lg;
x >>= lgbits/2;
x *= x;
unsigned int round = T & 1;
T >>= 1;
if (x & hibit) {
x >>= 1;
lg -= T+round;
}
}
//Interpolate remaining bits linearly
while (T != 0) {
if (x & lgbit) lg -= T;
T >>= 1;
x <<= 1;
}
return lg;
}

If so, was this intended to be an exponential function or a normal distribution? I ask because it really is a skewed normal distribution that we need, not an exponential function: we want visiting trips to have a range of between 3 minutes and 5,400 minutes (90 hours) tolerance, but for there to be more people willing to travel for 10 minutes than 3 and more willing to travel 30 minutes than 10, and so forth, but much, much, much, much fewer willing to travel 10+ hours than 2 hours.

Sarlock, when you refer to the commuting portion and travel portion of the curve, I think that there is something of a misunderstanding: visiting trips and commuting trips do not occupy different parts of the same curve: they each have their own curves: visiting trips between 3 minutes and 90 hours, and commuting trips between 20 minutes and 2 hours. The commuting distribution can be an unsekwed normal distribution, whereas the visiting distribution needs to be extremely heavily positively skewed.

I have tried with the cubed distribution already, using the code that I posted in the previous thread, and passing "3" as the exponent for visiting trips, with "2" for commuting trips. Even then, about 1/10th of passengers are willing to make journeys of 6 hours or more, which is excessive. I tried using an exponent of 4, but the resulting numbers were so large that they could not be contained in unsigned 64-bit integers, and overflowed producing mostly zero.

As to the long distance passengers and their maximum distance, the whole concept of distance ranges has been completely abolished in this code on which I am working because it created anomalies (such as no passengers willing to take the requisite time to make a (realistically) very slow but fairly short journey in the early days). Instead of local, midrange and long distance passengers each with distance ranges and their own journey time tolerance range, there will now be commuting and visiting passengers with no distance range and each with their own journey time tolerance range, as discussed above.

Incidentally, subtracting a number from the result will not work well (at least, not without adding something to the range), as we need at least some journeys to be at the top of the range, even if they are only 0.001% of all trips.

Offline Sarlock

  • Devotee
  • *
  • Posts: 1340
  • Languages: EN
Re: Calibrating the passenger factor
« Reply #83 on: April 02, 2014, 02:44:39 PM »
Thank you for the clarification.  It may be difficult to mathematically represent your bell curve that peaks at a middle value, say 30 minutes.  You're probably better off setting a few conditions to make the data points conform to the curve you want.

A cubed exponential curve is very steep, so I suspect what is occurring is that the graph has a large maximum value and all small values (even 6 hours) is included in the very left part of the graph and still has a fairly high probability of occurring.  In 1750 at 15km/h, a 6 hour trip is 90km, or 720 tiles.  In the 20th century, at 200km/h, a 6 hour trip is 1200km, or 9600 tiles - which is likely wider than any map currently in play.

Commuting trips between 20 minutes and 2 hours can probably use the exponential curve and very nicely have most of the data points concentrated in the 20-60 mins range.  If you cube it, then it will concentrate even more heavily in the 20-40 range.  With a square, 76% of trips will be below 60 minutes.  57% will be 45 minutes or less.  With a cube, you get 92% below 60 minutes, 82% below 45 minutes.

Representing visiting trips of 3-5400 minutes, peaking at 30 minutes:
What percentage of passengers do we want taking a trip of 15-45 minutes' length?  30%?  How about 45 minutes to 120 minutes (2 hours)?  Another 30%?  How many taking a trip 3-15 minutes?  Volumes would drop off quickly above 120 minutes.  This will determine the shape of the curve.  You may need to have 4 separate calculations for each range, to confine the data to a set that makes sense (and is easy to calculate).  If you force the calculation to perform only, say, 5% of its data points above 2 hours, you can apply a squared exponential function and it should work fine - since we've already determined how many of that type to be calculated.

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 18696
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Calibrating the passenger factor
« Reply #84 on: April 02, 2014, 08:12:13 PM »
You are probably right about the large values - and, when I introduce portals, the values can only get larger: if we are simulating (albeit indirectly) a sailing ship voyage of the equivalent distance as from London to Sydney, that will be measured in months rather than hours, which journey at least some people were clearly prepared to tolerate, and yet the median tolerance should still remain in the region of 1 hour. I suspect that it can be done mathematically (and clipping the ranges in random samples has the disadvantage of producing arbitrary results and steep borders between values).

I wonder, however, whether there is something to be gained by the cumulative use of Kieron's formula: it will be recalled that the barrier to using it in a power of four is that the numbers become too large to fit in a 64-bit integer. Suppose, instead of using a power of four, we were to multiply together two random numbers obtained from the power of two version of the formula (rather than, as in the power of two version of the formula itself, two evenly distributed random numbers), and then divide that resulting number by the maximum - would that, I wonder, have the same effect as a power of four?

Offline ӔO

  • Devotees (Inactive)
  • *
  • Posts: 2345
  • Hopefully helpful
  • Languages: en, jp
Re: Calibrating the passenger factor
« Reply #85 on: April 02, 2014, 10:47:31 PM »
perhaps lognormal distribution is what you are looking for?

http://www.wolframalpha.com/input/?i=lognormal


Offline Sarlock

  • Devotee
  • *
  • Posts: 1340
  • Languages: EN
Re: Calibrating the passenger factor
« Reply #86 on: April 03, 2014, 05:02:17 AM »
A 4th exponential will probably return values that are so heavily skewed to the beginning figures that your distribution at the higher end will be almost non-existent.

I wonder if it's much faster computationally to take a random number and compare it to a data table that has the distribution data points in it... and then apply a small random variance to keep the data from "bunching".  ie:

Pick random number 0-9, then pick random number between:
0: 3-15 minutes
1: 16-20 mins
2: 21-25 mins
3: 26-30 mins  (bell curve peak)
4: 31-35 mins
5: 36-40 mins
6: 41-60 mins
7: 61-120 mins
8: 121-600 mins
9: 601-5400 mins

More data points would be required to add more detail to the curve, this is just for 10 points and isn't very refined (just for an example list).  You would likely want less than 10% of your trips being over 600 minutes - and probably want to add more data points even in the 600-5400 range.  50 or 100 points would probably be plenty.

This would allow you to fine tune the curve to be exactly how you want it to look without having to burden the system with complex math (and much more difficult to tweak if you still aren't happy with the results - with this method you can make the distribution match exactly what you want).

If you want to supply some rough data points for percentage of trips below 30 mins, 30-60, 60-120 and above, I could put together a quick series of data points and see what you think.

Offline ӔO

  • Devotees (Inactive)
  • *
  • Posts: 2345
  • Hopefully helpful
  • Languages: en, jp
Re: Calibrating the passenger factor
« Reply #87 on: April 03, 2014, 07:19:38 AM »
another possibility is using part of a quartic function.
If you have a graphing calculator handy, you could potentially make it calculate a best fit function from several points.

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 18696
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Calibrating the passenger factor
« Reply #88 on: April 03, 2014, 11:16:52 AM »
Hmm - that log normal distribution looks interesting - this appears to be the same graph shape as a skewed normal distribution. The data points idea is an interesting one, but I am concerned that it will produce non-smooth results with weird gameplay, and will be difficult to make work with user defined ranges. As for quadratic functions - I am afraid that the only things that I know about them is that they are called quadratic functions, and they are a very advanced area of mathematics. What would a quadratic function enable the programme to do, and are they computationally expensive (i.e., involve a great deal of recursion, many division operations or things such as square roots)? Remember that this random number generator must be called many times every step.

Offline Sarlock

  • Devotee
  • *
  • Posts: 1340
  • Languages: EN
Re: Calibrating the passenger factor
« Reply #89 on: April 04, 2014, 01:09:21 AM »
With enough data points (30 or 40 will likely suffice) the curve will be smoothed out so much that there will be no noticeable effect on the game.  I'll create something to this effect to demonstrate.  It makes it far easier to tweak the parameters of the curve by just changing a few values.

The data points could be user defined or put in to simuconf - allowing the user/pakset designer to determine the shape of the bell curve.

It's also super fast to calculate.

Offline Sarlock

  • Devotee
  • *
  • Posts: 1340
  • Languages: EN
Re: Calibrating the passenger factor
« Reply #90 on: April 04, 2014, 02:16:06 AM »


Spreadsheet:

Passenger Travel Times - Data Point System Excel File

Pretty nice curve for a quick sample set.  Might want to reduce the peak probabilities and round out the bell curve a bit, if desired... it probably peaks a bit too aggressively at 30 minutes.  I used a system with two data pieces per time period: maximum time and probability.  Total data points required: 20.  Probably could use just 15 and still perform fine.  Most of the diminishing numbers at the higher times (600+ minutes) could probably just be merged in to one set.

For each data sequence, we just choose a random number between the min and max.  You can't really see on the graph (just a bit on the tails of the curve), but each segment is basically a linear section... but even with just 20 segments the lines all smooth out in to a nice bell curve shape.

EDIT: Changed slightly to reduce peak probabilities and round out the peak a bit more.
« Last Edit: April 04, 2014, 02:42:23 AM by Sarlock »

Offline sdog

  • Devotee
  • *
  • Posts: 2039
Re: Calibrating the passenger factor
« Reply #91 on: April 04, 2014, 05:30:15 PM »
James, i'm affraid i've not read the thread so far, just answer based on your social media requests regarding the distribution problem:

The method of counting the leading 1s in the binary, Mathew described:

"Find the number n of leading 1s in the binary representation of x; i.e. x consists of n 1s, a 0, and a remainder of N-n-1 other bits. Call the remainder y."

Is nothing more than an integer calculation to get log_2. You can easily see that when you look at the real representation above. Don't forget the inverse of exponential is logarithm. He is not calculating a skewed normal distribution, but rather a (much more appropriate!) exponential distribution.

You heard from exp. distributions in economy perhaps under the term 'long tail'. I can't look it up, but i think i remember from statistical mechanics that one can both prove and empirically show that passengers waiting is described by such a function. That is not sourced here, if i got some time at hand i shall try to look it up, else i'll play the ball to someone else here.

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 18696
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Calibrating the passenger factor
« Reply #92 on: April 04, 2014, 09:01:11 PM »
Sdog,

thank you for your input and clarification on Matthew's function: that is most thoughtful. However, I am somewhat doubtful that a logarithmic/exponential function is what is needed here rather than a positively skewed normal distribution. You refer to waiting times above, whereas what I am actually trying to model are journey time tolerances, which are not quite the same thing. Marchetti's constant holds that people tend to have a fairly uniform travel time budget of about an hour, which strongly suggests that, for any given journey, more people should be willing to spend, say, half an hour travelling than are prepared to spend only three minutes travelling, but equally, more people should be prepared to spend half an hour travelling than three hours. Is there a particular datum on which you base your suggestion to the contrary?

Edit: Tests that I am in the process of carrying out are showing good results with compounding/recursion of the cubed version of Kieron's algorithm. Taking two results from the cubed version, multiplying them together and dividing by the maximum has produced a more satisfactory result, with 28 passengers being willing to travel in one game month of 6:24h with a minimum journey time of 4:18h travelling and 2:52h waiting, with 1,583 passengers recording "too slow" and 226 "no route". I will try again with a multiple of three and see whether that is better still. The questions are now, assuming this system proves suitable, how to calibrate it and how best to allow it to be customised in simuconf.tab, since there is no easy single exponent for scaling. Perhaps a few numbers with different modes, such as 0 or 1 for an even distribution, 2 or 3 for the squared or cubed skewed normal distribution, 4 for a double recursion of the cubed distribution and 5 for a triple recursion, or something of the sort?

Edit 2: A triple recursion does not seem to work well: dividing by two produces values higher than the maximum, whereas dividing by the square always produces zeros. The best results so far have been obtained by a single recursion.

Edit 3: I have found a way to parameterise it, I think, although I have not had a chance to do any serious performance testing yet or see whether this works well for extremely large numbers. The code is as follows:

Code: [Select]
/**
 * Generates a random number on [0,max-1] interval with a normal distribution
 * See: http://forum.simutrans.com/index.php?topic=10953.0;all for details
 * Takes an exponent, but produces unreasonably low values with any number
 * greater than 2.
 */
#ifdef DEBUG_SIMRAND_CALLS
uint32 simrand_normal(const uint32 max, uint32 exponent, const char* caller)
#else
uint32 simrand_normal(const uint32 max, uint32 exponent, const char*)
#endif
{
#ifdef DEBUG_SIMRAND_CALLS
    sint64 random_number = simrand(max, caller);
#else
    uint64 random_number = simrand(max, "simrand_normal");
#endif
   
    if(exponent < 2)
    {
        // Exponents of 1 make this identical to the normal random number generator.
        return random_number;
    }

    // Any higher number than 3 will produce integer overflows even with unsigned. 64-bit integers
    // Interpret higher numbers as directives for recursion.

    uint32 degrees_of_recursion = 0;
    uint32 recursion_exponent = 0;
    if(exponent > 3)
    {
        degrees_of_recursion = exponent - 4;
        if(degrees_of_recursion == 0)
        {
            recursion_exponent = 2;
        }
        else
        {
            recursion_exponent = 3;
        }

        exponent = 3;
    }

    const uint64 abs_max = max == 0 ? 1 : max;

    for(int i = 0; i < exponent - 1; i++)
    {
        random_number *= simrand(max, "simrand_normal");
    }

    uint64 adj_max = abs_max;

    for(int n = 0; n < exponent - 2; n ++)
    {
        adj_max *= adj_max;
    }

    uint64 result = random_number / adj_max;

    for(uint32 i = 0; i < degrees_of_recursion; i ++)
    {
        // The use of a recursion exponent of 3 or less prevents infinite loops here.
        const uint64 second_result = simrand_normal(max, recursion_exponent, "simrand_normal_recursion");
        result = (result * second_result) / abs_max;
    }

    return (uint32)result;
}

The parameters are:

0 or 1 - even distribution;
2: normal distribution with slight skew (squared);
3: normal distribution with large skew (cubed);
4 normal distribution with recursion skew (squared);
5: normal distribution with recursion skew (cubed);
6 and above: normal distribution with multiple recursion skew (cubed; these values produce so extreme a skew as may be of limited usefulness).

Early experimentation appears to show that it might be possible to have a much higher value for the maximum level of tolerance (perhaps circa 1,728,000, representing four months' worth of tenths of minutes, or, if we were to start recording time in seconds rather than minutes*, 20 days) using the number 6, but the results can be somewhat erratic.
   
* The reason to allow some very high values is that I plan to introduce a "portals" feature, allowing for simplified/abstracted intercontinental travel without an increase in map size. This will entail extremely long journey times of multiple months thus requiring me to use a 32-bit rather than 16-bit integer type to store all journey time information. However, increasing the integer precision from 16-bit to 32-bit also allows timings to be stored in seconds rather than the current tenths of minutes. This changes the meaning of the numbers returned by the random number generator and thus the effect of the skew factor.

Edit 5: I have now implemented this fully and made it customisable in simuconf.tab, as well as adding an unskewed normal mode. The simuconf.tab comments should explain the operation of this system:

Code: [Select]
# The following settings determine the way in which individual packets of passengers decide
# what their actual journey time tolerance is, within the above ranges. The options are:
#
# 0 - Even distribution
# Every point between the minimum and maximum is equally likely to be selected
#
# 1 - Normal distribution (http://en.wikipedia.org/wiki/Normal_distribution)
# Points nearer the middle of the range between minimum and maximum are more likely
# to be selected than points nearer the end of the ranges.
#
# 2 - Positively skewed normal distribution (squared) (http://en.wikipedia.org/wiki/Skewness)
# Points nearer the a point in the range that is not the middle but is nearer to the lower
# end of the range are more likely to be selected. The distance from the middle is the skew.
#
# 3 - Positively skewed normal distribution (cubed)
# As with no. 2, but the degree of skewness (the extent to which the modal point in the range
# is nearer the beginning than the end) is considerably greater.
#
# 4 - Positively skewed normal distribution (squared recursive)
# As with nos. 2 and 3 with an even greater degree of skew.
#
# 5 - Positively skewed normal distribution (cubed recursive)
# As with nos. 2, 3 and 4 with a still greater degree of skew.
#
# 6 and over - Positively skewed normal distribution (cubed multiple recursive)
# As with nos. 2, 3, 4, and 5 with an ever more extreme degree of skew. Use with caution.

random_mode_commuting = 2
random_mode_visiting = 5

Thank you to all who have helped.
« Last Edit: April 05, 2014, 06:41:13 PM by jamespetts »