News:

Simutrans Sites
Know our official sites. Find tools and resources for Simutrans.

Instability on the Bridgewater-Brunel server

Started by DrSuperGood, September 06, 2018, 03:21:35 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Rollmaterial

I simply connected and let the game run. At some point I noticed the train I was following was running in reverse schedule, which I fixed. Then on the third time I connected i stayed in sync for over an hour. I then tested again and once again desynced after ~10 minutes twice, then managed to stay connected for longer on the following attempt, so I ruled out the reverse schedule train having had any incidence.

jamespetts

I am not sure that I understand this test: trains will run in reverse automatically if they reach the end of their schedule and are set to run their schedules in reverse. You write that you "fixed" the running in reverse - do you mean that you simply unchecked the run in reverse box on that individual train's schedule (which would make that specific train turn around and go in the other direction)?
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Rollmaterial

Yes, I unchecked the reverse schedule box in the main convoy window. The line is manually scheduled in both directions, it does not use mirror schedule. As I said, it turned out it was just a coincidence.

jamespetts

I am also seeing this pattern - very long periods of stability and then extended periods of instability where a loss of synchronisation will occur within a very short time of disconnecting.

I wonder whether the instability might be related, therefore, to the position of the trains in the schedule: on this occasion, I started the trains all at once, so they will all be in a very similar part of their schedule at the same time.

It would be very helpful if anyone could check to see whether the instability correlates in any way with the position of the trains on the Northern Frontier Express in any particular part of its schedule.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Rollmaterial

Also, not everyone desyncs at the same time, but some people desync while others remain connected.

jamespetts

That is interesting; however, it would nonetheless be useful to know whether some people lose synchronisation in a way which correlates in any way with where the trains on the Northern Frontier Express are on their schedule.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

jamespetts

#181
I have been carrying out further tests concentrating on the schedule of the Northern Frontier Express. I have created a new schedule based on that of the Express and moved all the trains to the new schedule. This new schedule ("TEST line") is the same as the previous schedule except that it uses the reverse route feature and lists each station only once.

My findings so far are still inconclusive, but somewhat interesting. Mostly, this runs without error or loss of synchronisation. However, about 30-45 minutes ago, I lost synchronisation. When I logged back in, I found that one of the trains had incorrectly selected "reverse route" part way through the schedule and was blocking the path of one of the other trains on the line by reversing prematurely without crossing to the correct line. After manually correcting this, the game is again running without error or loss of synchronisation for an extended period of time, the train apparently following the schedule correctly.

I note that Rollmaterial had noticed a spurious instance of selecting "reverse route" earlier. I do wonder whether this might be related in some way to the loss of synchronisation. The problem is that it is extremely difficult to test because the conditions for reproducing the error are so infrequent and erratic.

I should be very grateful if anyone else could log into the server and:

(1) check whether it is stable over a long period; and
(2) track the position of convoy no. 4233 and, if it becomes unstable, note the position of that convoy and report here what that position is.

If this be done frequently enough, over the course of a few days or weeks, we should get an idea of whether there is any pattern that suggests a possible correlation between the two, or whether there is no correlation.

The fact that the problem occurs on one particular railway line does suggest that an issue relating to the schedule itself might well be the cause of the loss of synchronisation, but beyond that there are still very little in the way of experimental data to demonstrate the probable causal mechanism or even the region in the code where the causal mechanism is likely to take place.

What is very odd about this issue is that it only arose after the game had been played, with intensity, for nearly 190 game years, with intensive use of railways for 115 of those years. There is evidently something very, very idiosyncratic about this error that makes it exceptionally hard to track down. The more assistance that I can have in localising the error, the sooner that it will be possible to narrow it down to a specific area of the code and, ultimately, fix it and continue work on fixing lower priority bugs, adding features and improving the game balance.

All assistance will be much appreciated.
Edit: Just after posting this, I observed some more interesting results. I noticed a train travelling at 35km/h in the drive by sight working method and realised that it was on the wrong line. It was just beyond St. Mary Beddington station. This is very near where the train was when the last loss of synchronisation occurred. Then, there was a further loss of synchronisation.The trains were correctly in reverse route mode - the problem I traced back to the junctions at Bickstable Fields station, where, just beyond the station, there is a missing one way sign next to crossovers that allow a train to cross over onto the other line. I have replaced the one way sign and resolved the deadlock potential by disabling the reverse route for the two trains that had gone down the wrong line.

I should be very grateful for any further testing to check whether any further loss of synchronisation occurs in this new setup.
Edit 2: I have just lost synchronisation again, at the very point that a train stopped at St. Mary Beddington station (in reverse route mode on the correct track).

Edit 3: After reconnecting again, a train in the reverse route direction called at St. Mary Beddington without loss of synchronisation; but I have just lost synchronisation again a second or two before a train finished stopping in the platform at St. Mary Beddington in the reverse route direction.

Edit 4: Some extremely interesting results. The game remained stable for an extended period (>30 minutes approximately) while no trains called at St. Mary Beddington. Then, a series of two trains called at that station. Seconds after the second of them departed, there was a loss of synchronisation. The direction was as before in reverse route (that is, departing towards Elmley). I have not yet seen departures from that station in the other direction, so cannot confirm whether errors occur in these circumstances.

There now seems to be an extremely strong correlation between trains calling at St. Mary Beddington and the loss of synchronisation. It seems usually to be the second train departing that station after the client first connects that does this. What would be very useful is if others could test and observe St. Mary Beddington station - what I need is a clear idea of whether there are any losses of synchronisation that do not occur shortly before or after a train calls at St. Mary Beddington station. If people could record the direction in which the train is travelling when departing and how many previous trains departed before the loss of synchronisation, that would be extremely helpful.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

DrSuperGood

I assume you have set the server to perform checksum checks every frame or something like that? Otherwise there will be a delay between the action that causes the clients to go out of sync and the server detecting the out of sync with the client disconnecting.

jamespetts

The following is specified in simuconf.tab:


server_frames_between_checks = 32


One thing that seems likely on the face of it is that the anomaly occurs at the previous station on the route and is an anomaly affecting departure time, but that this is only manifested when the train arrives at the next station, as this is the first point that that anomaly can be translated into a difference in the step in which the random number generator is called.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Ves

Hello James, and all!

Sorry for having been absent for such a long time. RL have been pushing and I havent found much time to do Simutrans, other than reading the forum.

I did put myself up to do alot of logging on the trains on the stations today however, but nothing seemed unnormal about trains stopping, loading, departing or doing other train stuff.
However, I noted that trains departing in the reverse direction tends to initially display a distance to the next station that is way off!
For instance 193km from Camberwell New Town Railway Station  to Rockhead Gate Railway Station, which should be closer to around 12-14km. Ca midways the distance updates it self to a much reasonable value.

Other observations:
188km is noted from Bickstablewood Fields Railway Station -> St. Mary Beddington Piccadilly Railway Station
166km is noted from St. Mary Beddington Piccadilly Railway Station -> Buckllock Hill Railway Station
I didnt catch the displayed distance from when the train was leaving, but midways it displayed 72km from Underwater Bridge Railway Station to Wyndingborne Copse Railway Station
149km is noted from Buckllock Hill Railway Station -> Buckllock Bridge Railway Station
I didnt check the rest of the stops, but I assume that there are more instances of this.

During the hole period (ca 5 ingame hours) I had no desyncs at all. Occasionally the game would freeze for some minutes, but nothing more.

Hope this is of any assistance

jamespetts

Thank you for this. The observation regarding the distance appears to be a UI bug. May I ask: are you running Linux?
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Ves

No on windows 10.
Ok, however I have never seen that ui-bug before...

Skickat från min ONEPLUS A6003 via Tapatalk


jamespetts

Thank you  - that is very helpful. It is odd that you are not experiencing any loss of synchronisation. Inconsistent results such as this can make tracing the issue hundreds of times more difficult and lengthy than it would otherwise be.

Can I ask - what behaviour are others seeing at present?
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Ves

#188
Regarding the distance displayed, I dont think that is just an UI thing. Looking in the code, the information is fetched and calculated from

cnv->front()->get_route_index()
and
cnv->get_route()->get_count()
It is very basic code that should be fail proof in the info window, and the distance bar and the written distance are not calculated together, yet they both show the way ambigious distance.
Also, this is so far I have seen only happening in the reverse direction, when the "reverse" is ticked, but not from all stations, though.

It might be completely unrelated, but you stated that the "reverse" status was of interrest, as well as the station "St. Mary Beddington Piccadilly Railway Station" (which produces this behaviour).

I have been online now for ca 30 minutes without desync, however, when I first joined the servergame, it was so slow to open so I tabbed away and forgot about it. When I remembered that I had attempted to join the server 30 minutes later or so, it had desynced in the background.

edit:
just came home from a small walk, and the game had desynced while I was away. The game where tabbed away during my walk.

jamespetts

Thank you very much for your testing.

First of all, as to the UI issue - what I suspect is happening is that the distance is incorrectly being calculated on the assumption that the convoy is moving forward in its schedule rather than backwards, so calculating the route as it would be if the convoy had to go all the way through the schedule to get to the next point in the other direction. This would be a UI issue. I have not looked into this in detail, however.

As to the loss of synchronisation, one thing that can be inferred from what I observed is that the trains were departing from the station immediately prior to St. Mary Beddington at different times on the client and server. At one point, I did observe a train depart from St. Mary Beddington immediately that it arrived without even waiting for the minimum loading time, but this is not readily reproducible. I have just pushed some slight changes to the code for calculating the waiting time at stations, in particular, making the numerical types more consistent; I suspect that this will not make much difference, but it is just possible that the error lies here, and it would be worthwhile re-testing to-morrow to see whether this has helped.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

jamespetts

#190
Further testing shows that the minor changes made yesterday evening have not prevented the loss of synchronisation. The issue still seems to be confined to trains arriving/departing from St. Mary Beddington. There appears to be no clear pattern as to when arriving/departing at this station will cause the loss of synchronisation: the first arrival/departure after logging in can trigger the failure, whereas there can be quite a few in sequence without this occurring. As to whether this occurs only in reverse direction, this is inconclusive so far: the only occasion on which I saw a loss of synchronisation on departure on the foreward direction coincided with a train just about to arrive in the reverse direction.

The next phase of the test is to remove St. Mary Beddington from the schedule and see whether this prevents the loss of synchronisation, or simply moves where it occurs (suggesting that hte problem occurred at the previous station).

Edit: I have now modified the line on the server to remove the stop at St. Mary Beddington. It will be interesting to see whether this will affect the loss of synchronisation.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Ves

James, did you just loose syncronization right now, or logged out? I have been on the server now for around an hour without any desyncs at all. I have had 4 trains coming through the St.Mary Beddington area and the surrounding stations in reverse up until now without problems.

jamespetts

Yes, I have had loss of synchronisation a few times, including quite recently. The latest test was to see whether removing the signal from the second to last tile on the platform at Bucklock Bridge and replacing it where it should be (i.e. on the very last tile) made any difference, but apparently not. Having deleted St. Mary Beddington from the schedule, the loss of synchronisation seems now to occur on departure from Bucklock Bridge - two stops along in the reverse direction.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Ves

Ok, for information, I am still on the servergame. When I first came online, I think I kicked you out in the process, because it said you had left just when I logged in an hour or so ago.

jamespetts

Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Ves

Now I got a desync. I had the window tabbed away, but noted some 5 minutes ago that the game had freezed, like it have used to occasionally do through out the entire session (1½-2 hours)

jamespetts

I do notice that it does that - this appears to be referable to the networking code written by Dr. Supergood to prevent loss of synchronisation: from what I understand, the client does this so as not to get ahead of the server.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Ves

Ok, so that is a good thing then!
Now I got a desync again. It was in tabbed away stated, but I just some minutes before tabbing away (and desync) I watched two trains going the reverse direction past the Buckllock Bridge station, and no other train within hours (according to the station window time table).

jamespetts

The loss of synchronisation does now appear to occur either just after trains leave or just before trains arrive at this station in the reverse direction. It is very difficult to deduce what is different about this station on this line compared to all the many hundreds others in use in this huge map. It is really very odd.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Ves

But wasnt it with the St. Mary Beddington Piccadilly Railway Station to begin with? What station are you refering to as being the suspect?

jamespetts

It was St. Mary Beddington Picadilly until I removed that stop from the schedule. It then became Bucklock Bridge. I had suspected that it was the station immediately before St. Mary Beddington that was causing the problem that was only manifesting at St. Mary Beddington, but this does not seem consistent with it occuring two stations further on after St. Mary Beddington is removed from the schedule.

Perhaps you could try removing the station immediately before St. Mary Beddington on the schedule and test to see whether that prevents the loss of synchronisation, and, if it does, try re-adding St. Mary Beddington and seeing whether that makes a difference?
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Ves

#201
I have not had a consistent desync at all, so Im not sure I would produce any consistent results. Remember that from my earlier session, I was online for almost 2 hours with four trains passing through the area in the reverse direction.
I might try something, but it is getting late and I might not be able to conduct any proper tests tonight.

edit 1:
Ok, now it desynced immediately when a train pulled up to the Bucklock Bridge.
Will check if the next train in line does the same thing....

edit 2:
No it didnt, it stayed in sync...
However, I have now deleted the Bickstablewood Fields Railway Station from the schedule, so lets see over the coming days if that improves things! Have not had any trains passing throguh at this point.

edit 3:
Nah.. got a desync with no train what so ever in the vicinity of either Bickstablewood Fields Railway Station nor Buckllock Bridge Railway Station...

The only odd thing I can see on the map is the displayed distance in the convoy window. It doesnt really look like the information could be calculated wrong in the info window, which suggests something wrong in the cnv->front()->get_route_index(), since that is the one that indicates the correct distance to the next destination. The values doesnt really add up to be a circumnavigation of the track to counter for wrong reverse counting, as demonstrated with the first example I gave:

193km from Camberwell New Town Railway Station  to Rockhead Gate Railway Station

The distance between the two stations are 10,88 km bird way (from the stop info window), and the distance between Camberwell New Town Railway Station and the reversing termini (Bealdean Rye....) is 17,62 bird way. So 10,88 + 17,62 + 17,62 + 10,88 is ca 56-57km and far from 193km, even though the trains are not traveling in a straight line.

Given the pretty inconsistent desync results, that might be worthwhile investigating, if not just to make sure that it is NOT because of that.

jamespetts

Thank you very much for that testing. Can I ask - does the anomaly with the distances occur on any other line?
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

jamespetts

I have carried out some further testing. The distance anomaly appears to be directly connected to the reverse route feature: what seems to happen is that, when in reverse route, a train will calculate its route initially to the opposite platform of the same station, but will shortly afterwards reset it to the correct destination. I am not at this stage sure how it resets it, since the calc_route() method is not called - it may well have calculated it via the correct destination and then later truncate it.

However, what is clear is that this cannot be the cause of the loss of synchronisation on the server game, since the original line (the "Northern Frontier Express") did not use the reverse route feature. Instead, the stations were simply entered manually in both directions. Testing shows that the distance anomaly does not occur on the unmodified saved game.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

jamespetts

Additional testing shows that the following two modifications applied both to server and client do not prevent loss of synchronisation:

(1) disabling multi-threading for convoys; and
(2) disabling the system for trains stopping in the centre of platforms for non-terminus stops.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

jamespetts

Further testing shows that disabling overcrowding on the debugging railway carriages does not prevent the loss of synchronisation. Incidentally, I have reverted to the original Northern Frontier Express schedule, and so the loss of synchronisation occurs departing from St. Mary Beddington heading to Elmley.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

jamespetts

Further testing has shown that simply changing the locomotives on the trains on the Northern Frontier Express to the GWR Castle class prevents the loss of synchronisation. It is not immediately clear why this should be at this stage; however, the prime suspect is air resistance: the A4 class is a streamlined locomotive, and has an air resistance value set manually in the pakset. Earlier in my testing a month or so ago, I also tested with the Southern Merchant Navy class, which has an air smoothed casing and also a custom air resistance value; this caused loss of synchronisation, too.

I have spotted some loss of precision in the fixed point system used to emulate floating point arithmetic where the air resistance value is read; however, this loss of precision seems constant and does not appear to change when using different methods of calculating the air resistance values. The current version running does indeed use a slightly modified way of storing the air resistance values, but this has not solved the loss of synchronisation.

Because the game had advanced to 1947 by which time no streamlined locomotives were available, I have re-set the game to its previous running state in 1941, although I have saved where it had reached in 1947.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

jamespetts

I have temporarily disabled the custom air resistance value for the LNER A4 class and re-instated the 1947 version of the saved game with the A4 class hauling the trains (these are slightly longer trains than previously, and I note that the loss of synchronisation no longer occurs at St. Mary Beddington). Before temporarily disabling the custom air resistance value, a loss of synchronisation could still be reproduced.

I have managed now to stay connected for quite a while after disabling the custom air resistance for the A4 class, but my computer crashed after about 30 minutes. I should be very grateful if anyone could carry out a longer-term test to check for stability.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

jamespetts

Further testing has shown that disabling the custom air resistance for the LNER A4 class makes no difference to the loss of synchronisation: this still occurs on occasions when the train arrives at St. Mary Beddington.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Ves

I just got a desync after half to a full hour, while a train has been traveling through the St Mary station