The International Simutrans Forum

 

Author Topic: Instability on the Bridgewater-Brunel server  (Read 6346 times)

0 Members and 1 Guest are viewing this topic.

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 17764
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Instability on the Bridgewater-Brunel server
« Reply #70 on: October 06, 2018, 09:42:35 PM »
The result of the third test is significant and interesting. I took the 1937 game that runs without losing synchronisation, then used the public player tool to advance the year to 1940 without changing the game-state other than the date, and re-ran the test. This time, the game would lose synchronisation after a few minutes again. I re-tested with the 1937 game and confirmed that it did not lose synchronisation. This suggests that there is an issue with some item automatically placed in the game which has an introduction or retirement date in around 1939, the most obvious candidates for which are roads.
Edit: Further testing has shown that copying the latest (client) simuconf.tab to the server (save for replicating the server's original network settings) does not prevent the loss of synchronisation from occurring with the >1939 saved game.
« Last Edit: October 06, 2018, 11:00:18 PM by jamespetts »

Offline DrSuperGood

  • Dev Team
  • Devotee
  • *
  • Posts: 2524
  • Languages: EN
Re: Instability on the Bridgewater-Brunel server
« Reply #71 on: October 07, 2018, 04:06:21 AM »
Might be worth trying to advance the server beyond 1940 to see if there is a date the problem stops. This could help locate specific objects causing the problem. Obviously the year has to be advanced either offline and the server restarted or the server re-joined afterwards since one can assume that any windows client touching 1940 will be out of sync instantly and is only booted later when a checksum check is performed.

It might also be worth clean installing Simutrans on the server (making sure not to lose all saves). Although the pakset is hash checked by clients, files like simuconf.tab are not.

Of course one should make sure that the server is really going out of sync with the clients. It could be something to do with the time that starts somewhere in 1940 causing a false positive OoS detection.
« Last Edit: October 07, 2018, 06:50:56 AM by DrSuperGood »

Online Ves

  • Devotee
  • *
  • Posts: 1532
  • Languages: EN, SV, DK
Re: Instability on the Bridgewater-Brunel server
« Reply #72 on: October 07, 2018, 12:21:59 PM »
I went ahead and looked at what ways where becoming available in the time frame leading up to 1940. These objects are taken from all dats in this directory: https://github.com/jamespetts/simutrans-pak128.britain/tree/master/ways

Name=hr-asphalt-road-medium
intro_year=1935
intro_month=6

name=BrickViaduct
intro_year=1838
intro_month=7

Name=city_road
intro_year=1932
intro_month=1

name=ConcreteSteelCantileverRoad
intro_year=1937
intro_month=5

Name=concrete_road
intro_year=1936
intro_month=9

Name=runway
intro_year=1938
intro_month=9

name=airport_oneway
intro_year=1938
intro_month=9

Name=taxiway
intro_year=1938
intro_month=9

---- close retire dates ---- (not a complete list, since I didnt think of checking the retire dates until midway through the list...)

Name=macadam_road
retire_year = 1936
retire_month = 7

Name=WoodenTretleElevatedNarrow
retire_year=1938
retire_month=7

name=WoodenTrestleNarrow
retire_year=1938
retire_month=7

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 17764
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Instability on the Bridgewater-Brunel server
« Reply #73 on: October 07, 2018, 08:38:15 PM »
Dr. Supergood - that is a very useful suggestion. I have tried advancing the time to 2000, and there is no loss of synchronisation with this. I will try a few intermediate dates to see what the cut-off is.
Edit: The loss of synchronisation still occurs in 1950.
Edit: The error also seems to occur in 1975.
« Last Edit: October 07, 2018, 10:09:05 PM by jamespetts »

Offline DrSuperGood

  • Dev Team
  • Devotee
  • *
  • Posts: 2524
  • Languages: EN
Re: Instability on the Bridgewater-Brunel server
« Reply #74 on: October 08, 2018, 08:03:17 AM »
Might be worth binary searching the exact start and end year.

It could be coupled to town buildings/attractions, industry or private cars since all of those are subject to introduction or phase out with year.

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 17764
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Instability on the Bridgewater-Brunel server
« Reply #75 on: October 08, 2018, 10:08:06 AM »
Each round of testing takes a considerable amount of time, so it will take a long time to get to the point of checking the exact year. I am planning to try to find it as precisely as possible, however.

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 17764
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Instability on the Bridgewater-Brunel server
« Reply #76 on: October 12, 2018, 12:21:08 AM »
Further testing has revealed an error in the earlier testing, but that error itself has revealed interesting data. When I advanced to 2000 initially, I had used a game saved in 1937. However, the initial testings of 1950 and 1975 had used the game saved in 1939 - after the problem had arisen. Re-testing in 1952 with the game saved in 1937 shows that the client is able to stay in sync with the server.

This suggests that it is the presence in the game of an object that is automatically built sometime in the 1939-1952 era that causes the problem, rather than the building of the object while the client is connected.

I will have to test further when I have more time to see which year that the problem first goes away.

Offline DrSuperGood

  • Dev Team
  • Devotee
  • *
  • Posts: 2524
  • Languages: EN
Re: Instability on the Bridgewater-Brunel server
« Reply #77 on: October 12, 2018, 03:56:21 AM »
There is a limit to what objects are automatically created or manipulated.
  • Trees.
  • City buildings/attactions.
  • Walking passengers.
  • Private vehicles.
  • Terrain slopes (due to construction of city buildings).
  • City roads.
  • Resurfacing of all existing roads, rails, etc, potentially to a different type due to obsolesence.
  • Industries, and industry linking.
  • Power consumption/generation.
  • Bridges, and hence grounds, due to the construction of city road bridges over obstacles.
« Last Edit: October 12, 2018, 05:47:04 AM by DrSuperGood »

Online prissi

  • Developer
  • Administrator
  • *
  • Posts: 9309
  • Languages: De,EN,JP
Re: Instability on the Bridgewater-Brunel server
« Reply #78 on: October 12, 2018, 04:48:43 AM »
Are there exponenents or square roots used in any generation routin? It may be that those are slightly deviations only for number generated in that era. Because if there is no desync when running both under Linux, I would suspect something like this ...

Offline DrSuperGood

  • Dev Team
  • Devotee
  • *
  • Posts: 2524
  • Languages: EN
Re: Instability on the Bridgewater-Brunel server
« Reply #79 on: October 12, 2018, 05:46:38 AM »
Quote
Are there exponenents or square roots used in any generation routin? It may be that those are slightly deviations only for number generated in that era. Because if there is no desync when running both under Linux, I would suspect something like this ...
There is a software implementation for these which should be deterministic between platforms. The software implementation is heavily used by vehicle physics which cannot directly be the cause due to there being dates that the game remains in sync for hours despite ~10,000 vehicles.

Anyway an idea that occurred to me was to disable multi threading on both server and client for a test. If this stops it going out of sync then it is caused by something multi thread related.
« Last Edit: October 12, 2018, 07:21:19 AM by DrSuperGood »

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 17764
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Instability on the Bridgewater-Brunel server
« Reply #80 on: October 20, 2018, 02:41:37 PM »
Further testing shows that year skipping the 1937 saved game to 1952 produces a saved game that stays in sync between client and server.

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 17764
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Instability on the Bridgewater-Brunel server
« Reply #81 on: October 20, 2018, 05:29:12 PM »
Further testing shows that the 1937 game fast forwarded to 1940 also remains in sync with the server, suggesting that the earlier results implying to the contrary were contaminated with the confusion between different starting points identified earlier.

The consequence of this is that the earlier conclusion that the loss of synchronisation was not necessarily (and was probably not) caused by some automatically emergent objects such as buildings, private cars or pedestrians as previously thought.

Furhter investigation of the type originally carried out (i.e. into changes made by players to the network) will be needed.

Offline Junna

  • Devotee
  • *
  • Posts: 1081
Re: Instability on the Bridgewater-Brunel server
« Reply #82 on: October 21, 2018, 10:25:41 AM »
I replaced something like one-thousand two-hundred road vehicles, would it be part of it? It was around the time the desynching started... Many buses also got stuck, because a number of them, have spuriously high axle loads (equal to their entire weight, 6-7 tonnes).

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 17764
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Instability on the Bridgewater-Brunel server
« Reply #83 on: October 21, 2018, 10:48:50 AM »
I have been conducting a test to try to determine the cause of the problem by liquidating each company one by one and seeing whether the server remains in sync after that liquidation. I have so far liquidated Crandon & Lakes and Player 11 to no avail. I was about to test liquidating the next company last night when my computer crashed, so I am going to restart to-day. This test will help to determine whether or not anything that you describe might be relevant, although it is difficult to see at present what in what you describe could be part of the problem, since both replacements and vehicles getting stuck/having no route have been encountered commonly before without causing loss of synchronisation.

Offline DrSuperGood

  • Dev Team
  • Devotee
  • *
  • Posts: 2524
  • Languages: EN
Re: Instability on the Bridgewater-Brunel server
« Reply #84 on: October 21, 2018, 02:31:33 PM »
How many of the busses are left to replace? I am aware of a out of sync problem related to manual schedule changes but I did not think it applied to automatic changes.

Also how much power generation is going on? I recall a similar issue like this being caused by power nets on the last server.

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 17764
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Instability on the Bridgewater-Brunel server
« Reply #85 on: October 21, 2018, 02:45:25 PM »
Preliminary testing seems to show that the loss of synchronisation appears not to occur when the Bay Transport Company is liquidated. However, I have not been able to test this thoroughly, since my computer is currently not stable enough to remain running without hard-crashing when running the server game for more than ~15 minutes at a time (although this is still longer than it took to lose synchronisation before I liquidated Bay Transport).

The server is currently set up with Bay Transport liquidated, but all other companies intact. If anyone can connect and try to remain connected (without interaction) for circa 1 hour in this state, that would be very helpful. I can then try to narrow down the problem once this has been confirmed. Note that you will need to download the latest version from the server as I fixed a crash bug this afternoon.

In relation to the other suggested issues: the electricity related loss of synchronisation was fixed a long time ago. As to schedule changes, I am not aware of this being a current bug. If anyone can reproduce this with the latest version, please post a full bug report in the usual way.

Offline DrSuperGood

  • Dev Team
  • Devotee
  • *
  • Posts: 2524
  • Languages: EN
Re: Instability on the Bridgewater-Brunel server
« Reply #86 on: October 21, 2018, 05:15:06 PM »
Been connected to the server well over an hour. Even survived a save/load cycling of someone joining. No desyncs at all.

EDIT: A thought occurred to me. Now that we know removing Bay Transport solves the OoS, we have to prove that it is Bay Transport causing the OoS and not his interactions with everyone (since practically all companies connected to him in some way). Hence I propose restarting the server with a save that removes all other companies except Bay Transport and seeing if it OoSes still. If it does, then the problem is something in Bay's network and the removal of other companies might make this easier to identify.
« Last Edit: October 21, 2018, 08:32:08 PM by DrSuperGood »

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 17764
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Instability on the Bridgewater-Brunel server
« Reply #87 on: October 21, 2018, 09:07:34 PM »
Thank you very much for testing: that is most helpful. That is a good idea for a further test, too, but first I want to test to see whether the fix to the bug that caused a crash actually fixed the desync by running the original saved game again: whilst this is very unlikely, because the two coincided, I need to rule this out before testing further. Then, I will proceed with Dr. Supergood's proposed further test.
Edit: The conclusion of the first part of the test is that the crash fix did not fix the loss of synchronisation. I will now proceed with Dr. Supergood's suggested test.
« Last Edit: October 21, 2018, 09:34:11 PM by jamespetts »

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 17764
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Instability on the Bridgewater-Brunel server
« Reply #88 on: October 21, 2018, 10:36:47 PM »
I have now run the second part of Dr. Supergood's proposed test (and this is on the server now - you will need to update the executable again, as I had to fix another crash bug to run this): with just Bay Transport and the other companies removed, the client still loses synchronisation with the server. This implies that the issue is not at the intersection between Bay Transport and another network, but rather internal to the Bay Transport network.

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 17764
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Instability on the Bridgewater-Brunel server
« Reply #89 on: October 22, 2018, 08:50:59 PM »
I have just carried out a further test by withdrawing all of Bay Transport's road vehicles. Connecting to the game thus modified still results in a loss of synchronisation after a few minutes.
Edit: Removing the aircraft also does not remove the loss of synchronisation issue.
Edit: Likewise, removing trams has no effect. All that remains is rail, so it seems likely (but not certain without further testing) that the problem is associated in some way with rail transport.
« Last Edit: October 22, 2018, 09:50:51 PM by jamespetts »

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 17764
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Instability on the Bridgewater-Brunel server
« Reply #90 on: October 22, 2018, 11:43:46 PM »
Further tests show that removing all of Bay Transport's vehicles appear to allow a stable connexion to be maintained. It would be very helpful, however, if anyone else could test to verify this: the server is currently running in this state, so if anyone can stay connected for ~1 hour, this would be very good evidence of the stability.

Even more interestingly (perhaps), I discovered that I had missed some rail and road vehicles when I was testing earlier, and that some earlier versions of the testing saved game file (including the ones that I used to test the absence of road vehicles, aircraft and trams) still had one or two road vehicles left, as well as the first attempt at testing the removal of rail vehicles still had some road vehicles left. Testing with this version, the loss of synchronisation still seems to occur.

This is most interesting as, if the current saved game can be shown to be long-term stable, I can then remove vehicles one by one and see which one is responsible.

Offline Junna

  • Devotee
  • *
  • Posts: 1081
Re: Instability on the Bridgewater-Brunel server
« Reply #91 on: October 23, 2018, 01:26:02 AM »
This is kind of off-topic, but how do you force liquidation of another company on a server game?

Online prissi

  • Developer
  • Administrator
  • *
  • Posts: 9309
  • Languages: De,EN,JP
Re: Instability on the Bridgewater-Brunel server
« Reply #92 on: October 23, 2018, 08:30:57 AM »
nettool probably.

pak128.britian standard contains double objects, see here: https://forum.simutrans.com/index.php/topic,18506.msg176239.html

Three buildings appear from 1930 onward and are contained twice, once with cluster parameter and once without. Their building time is from 1930 to 1960, but if newer building appears in 1950 then those are built less frequently. Since the loading order of pak files depends on the file system (and thus is different between windows and linux) those COM_JH_1930_00_06A etc. may be the source of desync. With fewer companies, growth is more infrequent and such desync would happen less.

It might be useful, if the pak doublette feature from standard finds its way to experimental early, or if you check the debug messages for overlaid objects.

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 17764
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Instability on the Bridgewater-Brunel server
« Reply #93 on: October 23, 2018, 09:59:32 AM »
Nettool is indeed the way of liquidating single companies - the syntax is nettool [server details] remove-company [company number].

As to the duplicated buildings, thank you for the investigations in this regard. As I posted in the other thread, however, I cannot read the text posted there, so I cannot check whether any of these are duplicated in the Extended version of the pakset. I have just checked for duplication of COM_JH_1930_00_06A, but found only one object with this name.

Offline DrSuperGood

  • Dev Team
  • Devotee
  • *
  • Posts: 2524
  • Languages: EN
Re: Instability on the Bridgewater-Brunel server
« Reply #94 on: October 23, 2018, 10:11:52 AM »
Quote
It might be useful, if the pak doublette feature from standard finds its way to experimental early, or if you check the debug messages for overlaid objects.
While the server listing server was still working there was no pakset mismatch shown when connecting hence this is not the problem.

Offline SuperTimo

  • *
  • Posts: 33
  • Languages: English, French
Re: Instability on the Bridgewater-Brunel server
« Reply #95 on: October 23, 2018, 03:45:52 PM »
if anyone can stay connected for ~1 hour, this would be very good evidence of the stability.

I've been connected for around 40 mins with no issues at the moment.

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 17764
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Instability on the Bridgewater-Brunel server
« Reply #96 on: October 23, 2018, 04:00:25 PM »
Excellent, thank you very much for testing: that is very helpful.

Offline DrSuperGood

  • Dev Team
  • Devotee
  • *
  • Posts: 2524
  • Languages: EN
Re: Instability on the Bridgewater-Brunel server
« Reply #97 on: October 23, 2018, 04:40:14 PM »
I was connected for 80 minutes, no out of sync.

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 17764
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Instability on the Bridgewater-Brunel server
« Reply #98 on: October 23, 2018, 06:01:24 PM »
Excellent, thank you very much for testing.

I have now uploaded the other version that I described, which still has some residual road and rail vehicles in it for testing. It will restart with this version running in a few minutes. I intend to unlock Bay Transport so that we can all test to see which thing(s) are causing the trouble by removing them one by one. I should be very interested in anyone's results.


Edit: Now running and unlocked.

Offline Rollmaterial fi

  • Devotee
  • *
  • Posts: 526
  • Languages: EN, FR, DE, FI, SE
Re: Instability on the Bridgewater-Brunel server
« Reply #99 on: October 24, 2018, 09:43:10 PM »
I have managed to stay in sync for ~40 min without doing anything.

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 17764
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Instability on the Bridgewater-Brunel server
« Reply #100 on: October 24, 2018, 09:46:31 PM »
That is interesting, thank you. I will have to re-test, as I did originally get out of sync errors with this saved game.

Offline DrSuperGood

  • Dev Team
  • Devotee
  • *
  • Posts: 2524
  • Languages: EN
Re: Instability on the Bridgewater-Brunel server
« Reply #101 on: October 25, 2018, 12:52:51 PM »
Yeh the save is stable.

Is there one with all companies except bay removed? This one has most of bay's vehicles removed.

That said when I first joined I did get an index out of bounds crash. Not been able to reproduce it however.

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 17764
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Instability on the Bridgewater-Brunel server
« Reply #102 on: October 25, 2018, 05:56:55 PM »
Thank you very much for testing: that is helpful.

I have now restarted the server game with the version of the saved game with Bay Transport's railway network only (plus one or two 'buses that I omitted in error to remove earlier). The company is unlocked, so there is scope for testing as to which specific line(s) are associated with the loss of synchronisation by way of withdrawing the stock from the lines one by one and testing after each.

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 17764
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Instability on the Bridgewater-Brunel server
« Reply #103 on: October 27, 2018, 12:28:58 PM »
I am currently running a test in which I am removing rail lines of Bay Transport one by one and checking whether this affects loss of synchronisation for each line. I am going from the bottom of the list of lines upwards.

I appear to have found a stable state by removing all lines up to and including FRC - Roxingstoke - Templecaster (local). Removing all lines up to but not including that line did not prevent the loss of synchronisation, suggesting that something about this line might well be responsible for the issue, although further testing is needed to confirm this.

It would be helpful if people could connect to the server and test whether this is long term stable.

The next round of testing will be reverting to the version of the saved game in which the loss of synchronisation occurs to test whether removing only the abovementioned line will prevent the loss of synchronisation.

Offline SuperTimo

  • *
  • Posts: 33
  • Languages: English, French
Re: Instability on the Bridgewater-Brunel server
« Reply #104 on: October 27, 2018, 02:54:23 PM »
I joined the server and suffered loss of synchronisation after about 2 minutes. I will try again and see if that was a one off.

edit: same happened again. There are a lot of stuck vehicles and vehicles with no route, could these be having an effect on players staying in sync?