News:

Simutrans Tools
Know our tools that can help you to create add-ons, install and customize Simutrans.

Instability on the Bridgewater-Brunel server

Started by DrSuperGood, September 06, 2018, 03:21:35 PM

Previous topic - Next topic

0 Members and 2 Guests are viewing this topic.

jamespetts

#70
The result of the third test is significant and interesting. I took the 1937 game that runs without losing synchronisation, then used the public player tool to advance the year to 1940 without changing the game-state other than the date, and re-ran the test. This time, the game would lose synchronisation after a few minutes again. I re-tested with the 1937 game and confirmed that it did not lose synchronisation. This suggests that there is an issue with some item automatically placed in the game which has an introduction or retirement date in around 1939, the most obvious candidates for which are roads.
Edit: Further testing has shown that copying the latest (client) simuconf.tab to the server (save for replicating the server's original network settings) does not prevent the loss of synchronisation from occurring with the >1939 saved game.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

DrSuperGood

#71
Might be worth trying to advance the server beyond 1940 to see if there is a date the problem stops. This could help locate specific objects causing the problem. Obviously the year has to be advanced either offline and the server restarted or the server re-joined afterwards since one can assume that any windows client touching 1940 will be out of sync instantly and is only booted later when a checksum check is performed.

It might also be worth clean installing Simutrans on the server (making sure not to lose all saves). Although the pakset is hash checked by clients, files like simuconf.tab are not.

Of course one should make sure that the server is really going out of sync with the clients. It could be something to do with the time that starts somewhere in 1940 causing a false positive OoS detection.

Ves

I went ahead and looked at what ways where becoming available in the time frame leading up to 1940. These objects are taken from all dats in this directory: https://github.com/jamespetts/simutrans-pak128.britain/tree/master/ways

Name=hr-asphalt-road-medium
intro_year=1935
intro_month=6

name=BrickViaduct
intro_year=1838
intro_month=7

Name=city_road
intro_year=1932
intro_month=1

name=ConcreteSteelCantileverRoad
intro_year=1937
intro_month=5

Name=concrete_road
intro_year=1936
intro_month=9

Name=runway
intro_year=1938
intro_month=9

name=airport_oneway
intro_year=1938
intro_month=9

Name=taxiway
intro_year=1938
intro_month=9

---- close retire dates ---- (not a complete list, since I didnt think of checking the retire dates until midway through the list...)

Name=macadam_road
retire_year = 1936
retire_month = 7

Name=WoodenTretleElevatedNarrow
retire_year=1938
retire_month=7

name=WoodenTrestleNarrow
retire_year=1938
retire_month=7

jamespetts

#73
Dr. Supergood - that is a very useful suggestion. I have tried advancing the time to 2000, and there is no loss of synchronisation with this. I will try a few intermediate dates to see what the cut-off is.
Edit: The loss of synchronisation still occurs in 1950.
Edit: The error also seems to occur in 1975.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

DrSuperGood

Might be worth binary searching the exact start and end year.

It could be coupled to town buildings/attractions, industry or private cars since all of those are subject to introduction or phase out with year.

jamespetts

Each round of testing takes a considerable amount of time, so it will take a long time to get to the point of checking the exact year. I am planning to try to find it as precisely as possible, however.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

jamespetts

Further testing has revealed an error in the earlier testing, but that error itself has revealed interesting data. When I advanced to 2000 initially, I had used a game saved in 1937. However, the initial testings of 1950 and 1975 had used the game saved in 1939 - after the problem had arisen. Re-testing in 1952 with the game saved in 1937 shows that the client is able to stay in sync with the server.

This suggests that it is the presence in the game of an object that is automatically built sometime in the 1939-1952 era that causes the problem, rather than the building of the object while the client is connected.

I will have to test further when I have more time to see which year that the problem first goes away.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

DrSuperGood

#77
There is a limit to what objects are automatically created or manipulated.

  • Trees.
  • City buildings/attactions.
  • Walking passengers.
  • Private vehicles.
  • Terrain slopes (due to construction of city buildings).
  • City roads.
  • Resurfacing of all existing roads, rails, etc, potentially to a different type due to obsolesence.
  • Industries, and industry linking.
  • Power consumption/generation.
  • Bridges, and hence grounds, due to the construction of city road bridges over obstacles.

prissi

Are there exponenents or square roots used in any generation routin? It may be that those are slightly deviations only for number generated in that era. Because if there is no desync when running both under Linux, I would suspect something like this ...

DrSuperGood

#79
QuoteAre there exponenents or square roots used in any generation routin? It may be that those are slightly deviations only for number generated in that era. Because if there is no desync when running both under Linux, I would suspect something like this ...
There is a software implementation for these which should be deterministic between platforms. The software implementation is heavily used by vehicle physics which cannot directly be the cause due to there being dates that the game remains in sync for hours despite ~10,000 vehicles.

Anyway an idea that occurred to me was to disable multi threading on both server and client for a test. If this stops it going out of sync then it is caused by something multi thread related.

jamespetts

Further testing shows that year skipping the 1937 saved game to 1952 produces a saved game that stays in sync between client and server.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

jamespetts

Further testing shows that the 1937 game fast forwarded to 1940 also remains in sync with the server, suggesting that the earlier results implying to the contrary were contaminated with the confusion between different starting points identified earlier.

The consequence of this is that the earlier conclusion that the loss of synchronisation was not necessarily (and was probably not) caused by some automatically emergent objects such as buildings, private cars or pedestrians as previously thought.

Furhter investigation of the type originally carried out (i.e. into changes made by players to the network) will be needed.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Junna

I replaced something like one-thousand two-hundred road vehicles, would it be part of it? It was around the time the desynching started... Many buses also got stuck, because a number of them, have spuriously high axle loads (equal to their entire weight, 6-7 tonnes).

jamespetts

I have been conducting a test to try to determine the cause of the problem by liquidating each company one by one and seeing whether the server remains in sync after that liquidation. I have so far liquidated Crandon & Lakes and Player 11 to no avail. I was about to test liquidating the next company last night when my computer crashed, so I am going to restart to-day. This test will help to determine whether or not anything that you describe might be relevant, although it is difficult to see at present what in what you describe could be part of the problem, since both replacements and vehicles getting stuck/having no route have been encountered commonly before without causing loss of synchronisation.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

DrSuperGood

How many of the busses are left to replace? I am aware of a out of sync problem related to manual schedule changes but I did not think it applied to automatic changes.

Also how much power generation is going on? I recall a similar issue like this being caused by power nets on the last server.

jamespetts

Preliminary testing seems to show that the loss of synchronisation appears not to occur when the Bay Transport Company is liquidated. However, I have not been able to test this thoroughly, since my computer is currently not stable enough to remain running without hard-crashing when running the server game for more than ~15 minutes at a time (although this is still longer than it took to lose synchronisation before I liquidated Bay Transport).

The server is currently set up with Bay Transport liquidated, but all other companies intact. If anyone can connect and try to remain connected (without interaction) for circa 1 hour in this state, that would be very helpful. I can then try to narrow down the problem once this has been confirmed. Note that you will need to download the latest version from the server as I fixed a crash bug this afternoon.

In relation to the other suggested issues: the electricity related loss of synchronisation was fixed a long time ago. As to schedule changes, I am not aware of this being a current bug. If anyone can reproduce this with the latest version, please post a full bug report in the usual way.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

DrSuperGood

#86
Been connected to the server well over an hour. Even survived a save/load cycling of someone joining. No desyncs at all.

EDIT: A thought occurred to me. Now that we know removing Bay Transport solves the OoS, we have to prove that it is Bay Transport causing the OoS and not his interactions with everyone (since practically all companies connected to him in some way). Hence I propose restarting the server with a save that removes all other companies except Bay Transport and seeing if it OoSes still. If it does, then the problem is something in Bay's network and the removal of other companies might make this easier to identify.

jamespetts

#87
Thank you very much for testing: that is most helpful. That is a good idea for a further test, too, but first I want to test to see whether the fix to the bug that caused a crash actually fixed the desync by running the original saved game again: whilst this is very unlikely, because the two coincided, I need to rule this out before testing further. Then, I will proceed with Dr. Supergood's proposed further test.
Edit: The conclusion of the first part of the test is that the crash fix did not fix the loss of synchronisation. I will now proceed with Dr. Supergood's suggested test.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

jamespetts

I have now run the second part of Dr. Supergood's proposed test (and this is on the server now - you will need to update the executable again, as I had to fix another crash bug to run this): with just Bay Transport and the other companies removed, the client still loses synchronisation with the server. This implies that the issue is not at the intersection between Bay Transport and another network, but rather internal to the Bay Transport network.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

jamespetts

#89
I have just carried out a further test by withdrawing all of Bay Transport's road vehicles. Connecting to the game thus modified still results in a loss of synchronisation after a few minutes.
Edit: Removing the aircraft also does not remove the loss of synchronisation issue.
Edit: Likewise, removing trams has no effect. All that remains is rail, so it seems likely (but not certain without further testing) that the problem is associated in some way with rail transport.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

jamespetts

Further tests show that removing all of Bay Transport's vehicles appear to allow a stable connexion to be maintained. It would be very helpful, however, if anyone else could test to verify this: the server is currently running in this state, so if anyone can stay connected for ~1 hour, this would be very good evidence of the stability.

Even more interestingly (perhaps), I discovered that I had missed some rail and road vehicles when I was testing earlier, and that some earlier versions of the testing saved game file (including the ones that I used to test the absence of road vehicles, aircraft and trams) still had one or two road vehicles left, as well as the first attempt at testing the removal of rail vehicles still had some road vehicles left. Testing with this version, the loss of synchronisation still seems to occur.

This is most interesting as, if the current saved game can be shown to be long-term stable, I can then remove vehicles one by one and see which one is responsible.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Junna

This is kind of off-topic, but how do you force liquidation of another company on a server game?

prissi

nettool probably.

pak128.britian standard contains double objects, see here: https://forum.simutrans.com/index.php/topic,18506.msg176239.html

Three buildings appear from 1930 onward and are contained twice, once with cluster parameter and once without. Their building time is from 1930 to 1960, but if newer building appears in 1950 then those are built less frequently. Since the loading order of pak files depends on the file system (and thus is different between windows and linux) those COM_JH_1930_00_06A etc. may be the source of desync. With fewer companies, growth is more infrequent and such desync would happen less.

It might be useful, if the pak doublette feature from standard finds its way to experimental early, or if you check the debug messages for overlaid objects.

jamespetts

Nettool is indeed the way of liquidating single companies - the syntax is nettool [server details] remove-company [company number].

As to the duplicated buildings, thank you for the investigations in this regard. As I posted in the other thread, however, I cannot read the text posted there, so I cannot check whether any of these are duplicated in the Extended version of the pakset. I have just checked for duplication of COM_JH_1930_00_06A, but found only one object with this name.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

DrSuperGood

QuoteIt might be useful, if the pak doublette feature from standard finds its way to experimental early, or if you check the debug messages for overlaid objects.
While the server listing server was still working there was no pakset mismatch shown when connecting hence this is not the problem.

SuperTimo

Quote from: jamespetts on October 22, 2018, 11:43:46 PM
if anyone can stay connected for ~1 hour, this would be very good evidence of the stability.

I've been connected for around 40 mins with no issues at the moment.

jamespetts

Excellent, thank you very much for testing: that is very helpful.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

DrSuperGood

I was connected for 80 minutes, no out of sync.

jamespetts

Excellent, thank you very much for testing.

I have now uploaded the other version that I described, which still has some residual road and rail vehicles in it for testing. It will restart with this version running in a few minutes. I intend to unlock Bay Transport so that we can all test to see which thing(s) are causing the trouble by removing them one by one. I should be very interested in anyone's results.


Edit: Now running and unlocked.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Rollmaterial

I have managed to stay in sync for ~40 min without doing anything.

jamespetts

That is interesting, thank you. I will have to re-test, as I did originally get out of sync errors with this saved game.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

DrSuperGood

Yeh the save is stable.

Is there one with all companies except bay removed? This one has most of bay's vehicles removed.

That said when I first joined I did get an index out of bounds crash. Not been able to reproduce it however.

jamespetts

Thank you very much for testing: that is helpful.

I have now restarted the server game with the version of the saved game with Bay Transport's railway network only (plus one or two 'buses that I omitted in error to remove earlier). The company is unlocked, so there is scope for testing as to which specific line(s) are associated with the loss of synchronisation by way of withdrawing the stock from the lines one by one and testing after each.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

jamespetts

I am currently running a test in which I am removing rail lines of Bay Transport one by one and checking whether this affects loss of synchronisation for each line. I am going from the bottom of the list of lines upwards.

I appear to have found a stable state by removing all lines up to and including FRC - Roxingstoke - Templecaster (local). Removing all lines up to but not including that line did not prevent the loss of synchronisation, suggesting that something about this line might well be responsible for the issue, although further testing is needed to confirm this.

It would be helpful if people could connect to the server and test whether this is long term stable.

The next round of testing will be reverting to the version of the saved game in which the loss of synchronisation occurs to test whether removing only the abovementioned line will prevent the loss of synchronisation.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

SuperTimo

I joined the server and suffered loss of synchronisation after about 2 minutes. I will try again and see if that was a one off.

edit: same happened again. There are a lot of stuck vehicles and vehicles with no route, could these be having an effect on players staying in sync?