News:

The Forum Rules and Guidelines
Our forum has Rules and Guidelines. Please, be kind and read them ;).

[way-improvements] empty convoys due to calc_earliest_arrival_time_at()

Started by Philip, September 16, 2014, 10:50:10 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Philip

I'm seeing problems with passenger convoys leaving empty even though there are passengers waiting whom the convoy would take to their destination. This seems to occur when there are many convoys on the same line and their travel time is somewhat unpredictable, as is the case for road vehicles.

Investigating, the problem appears to be the code using calc_earliest_arrival_time_at(), which decides that some other convoy that's still en route to our halt is expected to arrive at the destination earlier than the convoy that is being loaded.

There are two problems with this: our guessed arrival times appear to be incorrect, probably because not enough data has been collected yet (however, a tester reports that it takes very very long for things to fix themselves that way).

Second, we're deciding to wait for a convoy that doesn't have enough capacity for anywhere near the number of waiting passengers. It's debatable whether passengers should be smart enough to board a convoy that's there but expected to arrive later rather than waiting for one that they probably couldn't get on board of anyway.

I fear this is somewhat difficult to fix without disabling the waiting logic entirely. However, I do propose some modifications to the current logic to make this less likely:
- add some tolerance so we won't wait for the next convoy unless it's, say, 20% faster than our current one, measured as time to ETA from current time
- add some extra tolerance based on when the next convoy is actually due to arrive at our stop
- tweak add_check_overflow_16 to more aggressively discard old data. In fact my preference would be to use an exponential decay average and simplify this code quite a lot in the process (even though we need to be careful not to use exp() as that uses floating point operations which might lead to network desynchs)
- sharply increase the tolerance time as more passengers wait

Any other ideas would be very appreciated.

I'm also curious about storing travel times per line rather than per convoy; doesn't that break for lines which mix fast and slow convoys?

In principle, we could implement seat reservations to have the current code work, but performance and memory usage would be affected massively, I think.

To reproduce the issue, it helps to modify a road vehicle's .dat file so it has only one seat, rebuild the PAK, and then run a large number of one-seat vehicles on the same basic line.

Again, I hope there is a simple solution that I'm simply missing.

jamespetts

Thank you for this analysis: this is helpful. In principle, this is a difficult problem to fix: it helps, I think, to try to look at what happens in reality. This waiting logic was intended to address the situation where convoys were arriving that took a different amount of time to get to the destination (for example, because they went two different directions around the same circular route on the same line, half having the reverse direction flag enabled and half not). Obviously, passengers in that case would take the short route around the circle, not the long route. The same applies to convoys on different lines that take detours, or stopping trains versus express trains, for example (and, in the latter case, it is relevant whether the express train is able to overtake the stopping train or not).

The difficulty here seems to come with arbitrary fluctuations in arrival times. In the circumstances, a means of disregarding small differences seems in principle sensible (although I wonder whether the necessary extra division operation for each check would have an adverse impact on performance as distinct from a simple comparison as now). It might be worthwhile having this percentage value adjustable in simuconf.tab, although this would mean changing the saved game format (but people should know that the saved game format is liable to change without a change in the version number at this stage of development in any event).

I am not sure what adding tolerance based on when the next convoy is due to arrive at the stop might entail, however; would you mind elaborating on this somewhat? It would also help if you could explain in terms that a person such as I whose mathematical skills are perhaps not the best what precisely an exponential decay average is and how this might work in this situation.

As for increasing the tolerance as more passengers wait, this would be difficult to calibrate, as a very different relationship would be needed depending on the size of convoys: fifty passengers is a lot to be waiting for a 'bus, but not that much to be waiting for a train, and a tiny amount to be waiting for a long-distance ship (but not a tiny amount to be waiting for a small ferry, so this cannot be done on vehicle type alone).

As for lines mixing fast and slow convoys, this is generally not encouraged where the difference between fast and slow is too great, as a single line should generally be a single route/diagram and thus suitable for generally the same type of vehicle. There might be relatively minor differences, but there has been really very bad planning somewhere if there are very significant differences in vehicle speed on the same line. It is on the basis of this assumption that the code is written.

This new system of predicting ultimate arrival times is very new and has not been extensively tested, so it is very useful to see how it works in practice and work out ways of abrogating such anomalies as it is found to have. Thank you for your work on this.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Philip

Quote from: jamespetts on September 16, 2014, 11:17:33 PM
Thank you for this analysis: this is helpful. In principle, this is a difficult problem to fix: it helps, I think, to try to look at what happens in reality. This waiting logic was intended to address the situation where convoys were arriving that took a different amount of time to get to the destination (for example, because they went two different directions around the same circular route on the same line, half having the reverse direction flag enabled and half not). Obviously, passengers in that case would take the short route around the circle, not the long route. The same applies to convoys on different lines that take detours, or stopping trains versus express trains, for example (and, in the latter case, it is relevant whether the express train is able to overtake the stopping train or not).

I see that you do not mention convoys on the same line going in the same direction (at the same schedule entry, too). I think it's quite realistic to make passengers board the first vehicle of the right line going in the right direction.

Quote from: jamespetts on September 16, 2014, 11:17:33 PM
The difficulty here seems to come with arbitrary fluctuations in arrival times. In the circumstances, a means of disregarding small differences seems in principle sensible (although I wonder whether the necessary extra division operation for each check would have an adverse impact on performance as distinct from a simple comparison as now). It might be worthwhile having this percentage value adjustable in simuconf.tab, although this would mean changing the saved game format (but people should know that the saved game format is liable to change without a change in the version number at this stage of development in any event).

I very much doubt the extra division (which we can avoid entirely by using bit shifts and nice binary numbers) is a performance issue.

25% appears to be sufficient to avoid the problem, at least in this test game.

Quote from: jamespetts on September 16, 2014, 11:17:33 PM
I am not sure what adding tolerance based on when the next convoy is due to arrive at the stop might entail, however; would you mind elaborating on this somewhat?

We know how long we expect to wait for the second convoy, and my proposal is simply to increase that time by a certain percentage (another 25%, maybe) to reflect the uncertainty of waiting for a convoy that might be running late.

Quote from: jamespetts on September 16, 2014, 11:17:33 PM
It would also help if you could explain in terms that a person such as I whose mathematical skills are perhaps not the best what precisely an exponential decay average is and how this might work in this situation.

You're right, sorry about that. There is an explanation at http://en.wikipedia.org/wiki/Moving_average#Exponential_moving_average , but that is quite heavy on mathematics. The basic idea is very similar to your add_check_overflow_16 routine: we keep an average by weighting previous results as less important the further back we have to go for them (by multiplying the nth value by alpha^n; alpha is a waiting factor that specifies how fast we forget about previous results); that means we can calculate a new average based only on the old average, alpha, and the new observation value (new average = old average * alpha + new observation value * (1 - alpha) ), without having to store the "count" value at all. In effect, add_check_overflow_16 would implement an exponential decay with alpha = 0.5 if you forced it always to reduce count to 1.

A sensible value for alpha would be about 0.5, weighting each new trip's duration as 50% of the average right after it happened (reducing alpha further would probably work better for games in which routes don't change all the time).

After the first trip, our exponential average is equal to the first trip's duration. For the second trip, we get the new average by multiplying that number by 50% and adding 50% of the second trip's duration. After the third trip, we're weighting the trips as 25%, 25%, and 50%, respectively; after the fourth, 12.5%, 12.5%, 25%, 50%, and so on.

What makes exponential averages so attractive is that there is only a single parameter to work out, alpha, and only a single data value stored in memory, our moving average.

Quote from: jamespetts on September 16, 2014, 11:17:33 PM
As for increasing the tolerance as more passengers wait, this would be difficult to calibrate, as a very different relationship would be needed depending on the size of convoys: fifty passengers is a lot to be waiting for a 'bus, but not that much to be waiting for a train, and a tiny amount to be waiting for a long-distance ship (but not a tiny amount to be waiting for a small ferry, so this cannot be done on vehicle type alone).

But we know the actual capacity of the convoys we're waiting for (but not how full they will be when they arrive), so we could calibrate based on that. However, I must admit that it's probably not realistic behaviour for passengers to be so considerate as to board an earlier convoy in preference to one that arrives sooner just because there are many other passengers waiting.

Quote from: jamespetts on September 16, 2014, 11:17:33 PM
As for lines mixing fast and slow convoys, this is generally not encouraged where the difference between fast and slow is too great, as a single line should generally be a single route/diagram and thus suitable for generally the same type of vehicle. There might be relatively minor differences, but there has been really very bad planning somewhere if there are very significant differences in vehicle speed on the same line. It is on the basis of this assumption that the code is written.

Wonderful, that's one thing not to worry about too much, then.

Quote from: jamespetts on September 16, 2014, 11:17:33 PM
This new system of predicting ultimate arrival times is very new and has not been extensively tested, so it is very useful to see how it works in practice and work out ways of abrogating such anomalies as it is found to have. Thank you for your work on this.

I'm still trying to understand the code, but it seems to me there are at least two changes that should be made to make waiting for the next convoy less likely:

First, haltestelle::hole_ab doesn't update the arriving convoy's ETA based on when it actually arrived at the current stop. If we arrived earlier at the current stop than we predicted, we should adjust the ETA in this_arrival_time for the difference, making it more likely this convoy will be chosen.

Second, it seems to me that the code in book_departure_time adjusts ETDs for loading/reversing delays but not ETAs. I would guess it's missing an "eta = etd;" somewhere.

I'll try making those changes and see whether it helps in the case of the test game.

jamespetts

Quote from: Philip on September 17, 2014, 11:10:47 AM
I see that you do not mention convoys on the same line going in the same direction (at the same schedule entry, too). I think it's quite realistic to make passengers board the first vehicle of the right line going in the right direction.

Yes, I suppose that this makes sense. I have tried and cannot think of a situation in which this should not occur. Its only disadvantage would be to reduce what is otherwise the relative simplicity of the current code, but your testing suggests that this may well be necessary.

QuoteI very much doubt the extra division (which we can avoid entirely by using bit shifts and nice binary numbers) is a performance issue.

I suppose that you are probably right; and there is only one way to find out: by testing on a very large map.

Quote25% appears to be sufficient to avoid the problem, at least in this test game.

25% is really quite a lot: if train A leaves the station now and takes an hour and a half to reach the destination, and train B leaves the station in quarter of an hour and takes only an hour to reach the destination, should passengers really always board train A? I imagine that, in real life, passengers in that situation would mostly take train B. If we implement the system of always taking the first convoy of any given line at the same point in its schedule (including whether it is in reverse or not), then is this percentage still necessary? Perhaps the first measure should be tested before this second to see whether it is needed.

QuoteWe know how long we expect to wait for the second convoy, and my proposal is simply to increase that time by a certain percentage (another 25%, maybe) to reflect the uncertainty of waiting for a convoy that might be running late.

Ah, yes, I see what you mean.

Quote
You're right, sorry about that. There is an explanation at http://en.wikipedia.org/wiki/Moving_average#Exponential_moving_average , but that is quite heavy on mathematics. The basic idea is very similar to your add_check_overflow_16 routine: we keep an average by weighting previous results as less important the further back we have to go for them (by multiplying the nth value by alpha^n; alpha is a waiting factor that specifies how fast we forget about previous results); that means we can calculate a new average based only on the old average, alpha, and the new observation value (new average = old average * alpha + new observation value * (1 - alpha) ), without having to store the "count" value at all. In effect, add_check_overflow_16 would implement an exponential decay with alpha = 0.5 if you forced it always to reduce count to 1.

A sensible value for alpha would be about 0.5, weighting each new trip's duration as 50% of the average right after it happened (reducing alpha further would probably work better for games in which routes don't change all the time).

After the first trip, our exponential average is equal to the first trip's duration. For the second trip, we get the new average by multiplying that number by 50% and adding 50% of the second trip's duration. After the third trip, we're weighting the trips as 25%, 25%, and 50%, respectively; after the fourth, 12.5%, 12.5%, 25%, 50%, and so on.

What makes exponential averages so attractive is that there is only a single parameter to work out, alpha, and only a single data value stored in memory, our moving average.

That is a very interesting idea. May I suggest that the first idea (re: convoys on the same line in the same position of their schedule, etc.) be tested first to see whether this alone is enough, and that the other ideas be tested one by one rather than implemented all at once? Doing it that way would also be a better way of catching unwanted side effects.

QuoteBut we know the actual capacity of the convoys we're waiting for (but not how full they will be when they arrive), so we could calibrate based on that. However, I must admit that it's probably not realistic behaviour for passengers to be so considerate as to board an earlier convoy in preference to one that arrives sooner just because there are many other passengers waiting.

Yes, I do not think that this would be realistic.

QuoteI'm still trying to understand the code, but it seems to me there are at least two changes that should be made to make waiting for the next convoy less likely:

First, haltestelle::hole_ab doesn't update the arriving convoy's ETA based on when it actually arrived at the current stop. If we arrived earlier at the current stop than we predicted, we should adjust the ETA in this_arrival_time for the difference, making it more likely this convoy will be chosen.

Ahh, that's a good idea.

QuoteSecond, it seems to me that the code in book_departure_time adjusts ETDs for loading/reversing delays but not ETAs. I would guess it's missing an "eta = etd;" somewhere.

I am not sure whether this is unintended, but it has been a while since I worked on this code, so I am not quite sure. I did test it in basic operation, and it seemed all right; what exactly would you envisage happening if the ETDs were adjusted to take into account the ETAs - does this just mean adjusting the ETDs in more places in the code so that they are more up to date at certain points of code execution?

QuoteI'll try making those changes and see whether it helps in the case of the test game.

Thank you - I should be most interested in your results.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

MCollett

Quote from: jamespetts on September 18, 2014, 10:26:50 PM
25% is really quite a lot: if train A leaves the station now and takes an hour and a half to reach the destination, and train B leaves the station in quarter of an hour and takes only an hour to reach the destination, should passengers really always board train A? I imagine that, in real life, passengers in that situation would mostly take train B.

The key word here is 'mostly'.  In the real world, in any situation like this, some passengers will make one choice and some the other.  But in SimuTrans as it stands, the choice is all-or-nothing.  If the choice is a close one, we then get erratic oscillations:  the preferred service is overloaded, and so runs slower, so next month everyone takes the other one, so now that one is overloaded and slower, so everyone switches back ...  Making the choice probabilistic, with appropriate weighting in favour of the shorter time, would smooth things out and be generally more stable.

Quote from: Philip on September 17, 2014, 11:10:47 AM
it's probably not realistic behaviour for passengers to be so considerate as to board an earlier convoy in preference to one that arrives sooner just because there are many other passengers waiting.

I could imagine doing that if it was a choice between getting a seat and standing all the way. (Yes, I know, Transimulanians only consider time and not comfort when making their route choices.)

Best wishes,
Matthew

jamespetts

The general reason to have the decision made in only one way is to save computational effort: having to make the decision per passenger rather than in the abstract for all passengers would increase the computational load enormously.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Philip

James,
thank you for your comments! I'm afraid I tested things in the reverse order of what you suggest. I'm having second thoughts about sharing timing information between convoys on the same line, I must admit; I sometimes do mix fast and slow convoys (or mail and passenger convoys) on the same line, and that's one issue. The other issue is that it seems to me to overload the concept of lines if we make the game behave drastically different depending on whether convoys serve a line or individual schedules—ideally, lines should save the players work without their having to worry about things slowing down because too much data is being shared between convoys.

That said, I've made the simple modification to the code that journey times are registered with all convoys on the same line, using the same departure point, and composed of the same vehicles, and that appears to work on a simple test game (as expected); I haven't yet tested it with several lines, which of course is when things get interesting.

My point about ETDs and ETAs was simply that the ETA at stop #2 should be the ETD at stop #1 plus the journey time, not the ETA at stop #1 plus journey time. Or am I missing something?

Quote from: MCollett on September 18, 2014, 11:08:39 PM
The key word here is 'mostly'.  In the real world, in any situation like this, some passengers will make one choice and some the other.  But in SimuTrans as it stands, the choice is all-or-nothing.  If the choice is a close one, we then get erratic oscillations:  the preferred service is overloaded, and so runs slower, so next month everyone takes the other one, so now that one is overloaded and slower, so everyone switches back ...  Making the choice probabilistic, with appropriate weighting in favour of the shorter time, would smooth things out and be generally more stable.

I must say I agree with you, Matthew, and I don't think the computational load would be significant, to be honest (right now, for some reason, hole_ab is called way more often than I think it should be, which might show up during testing). In fact, I'm thinking that this might be the best solution for those situations in which a stop is overcrowded and convoys still leave empty. However, it occurs to me that this is not entirely easy to implement correctly, because passengers should make the decision to wait just once, not once for every newly-arriving slow convoy that they already had decided to skip.

An alternative would be to take into account how long passengers have been waiting already, either directly or as a hash value to calculate tolerance. I don't quite understand under which circumstances ware_t packets are being merged, though, which might affect things significantly.

I've been thinking about the 25% tolerance value, and I do agree it's a little high for unconditional decisions, but I feel it might be just right as a mean value for randomised decisions (so half of all passengers would go to either convoy in the situation you describe), though we could probably reduce that significantly while still fixing the original issue.

Thank you both, and I'd appreciate any further ideas or comments. (I understand you probably won't find time for it, but if you're interested in the testing code, it's at the eta-etd branch at https://github.com/pipcet/simutrans-experimental/compare/clean...eta-etd).