The International Simutrans Forum

 

Author Topic: Desync issue (devel-new-2) with Linux Server/Windows client  (Read 30495 times)

0 Members and 1 Guest are viewing this topic.

Offline Felix

  • *
  • Posts: 98
  • Languages: DE, EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #175 on: January 14, 2017, 08:31:14 PM »
The issue is also definitely savegame dependent. With a copy of the bridgewater-brunel savegame that I used in a local game for some time, I do not get the immediate disconnect with a local host.

Offline Felix

  • *
  • Posts: 98
  • Languages: DE, EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #176 on: January 14, 2017, 08:32:37 PM »
As said earlier, I am not sure if the extra package with the experimental-specific simconf.tab etc. is still needed. I am currently using the configuration files form the devel-new-2 branch.

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 19353
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #177 on: January 14, 2017, 08:36:57 PM »
The issue is also definitely savegame dependent. With a copy of the bridgewater-brunel savegame that I used in a local game for some time, I do not get the immediate disconnect with a local host.

Interesting - how long does it take before you desync?

There may be multiple, separate desync issues, of course.

What do you mean about the package with the experimental-specific simuconf.tab? Do you mean the .zip file distributed with the old release binaries from long ago? This ought not in principle to be an issue, since all of the configuration settings are saved with the saved game and transferred to the client when it first connects to the server, overriding any configuration settings in the client's simuconf.tab. In any event, the simuconf.tab from Github should be the most up to date version.

The Bridgewater-Brunel server has its own modified simuconf.tab to allow for settings specific to that server, such as the administrator's (i.e. my) e-mail address, a description, etc..

Offline Felix

  • *
  • Posts: 98
  • Languages: DE, EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #178 on: January 14, 2017, 08:38:10 PM »
With my local savegame the client desyncs only after like 10 min, but also with a mismatch of the random numbers.

Code: [Select]
ERROR: route_t::intern_calc_route(): Problem with heuristic:  from 1289,1610,9 to (1306,1717,9) at 1289,1615, best = 1590, cost = 50, heur = 1620, dist = 109, turns = 1461

For help with this error or to file a bug report please see the Simutrans forum:
http://forum.simutrans.com
Warning: karte_t:::do_network_world_command: sync_step=11776  server=[ss=11776 st=1472 nfc=0 rand=3328960089 halt=1 line=1 cnvy=1025 ssr=3461460419,3328960089,0,0,0,0,0,0 str=3328960089,3328960089,3328960089,3328960089,3328960089,3328960089,3328960089,3328960089,3328960089,3328960089,3328960089,3328960089,3328960089,1688219608,143636616,3328960089 exr=0,0,0,0,0,0,0,0  client=[ss=11776 st=1472 nfc=0 rand=3461460419 halt=1 line=1 cnvy=1025 ssr=3461460419,3461460419,0,0,0,0,0,0 str=3461460419,3461460419,3461460419,3461460419,3461460419,3461460419,3461460419,3461460419,3461460419,3461460419,3461460419,3461460419,3461460419,1688219608,143636616,3461460419 exr=0,0,0,0,0,0,0,0 
Warning: karte_t:::do_network_world_command: disconnecting due to checklist mismatch
Warning: karte_t::network_disconnect(): Lost synchronisation with server. Random flags: 0
Warning: nwc_routesearch_t::reset: all static variables are reset
Message: karte_t::reset_timer(): called, mode=$0
World finished ...
Show banner ...
Message: karte_t::reset_timer(): called, mode=$0
ERROR: route_t::intern_calc_route(): Problem with heuristic:  from 1336,2179,5 to (1337,2182,5) at 1337,2182, best = 70, cost = 70, heur = 700, dist = 0, turns = 630

Offline Felix

  • *
  • Posts: 98
  • Languages: DE, EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #179 on: January 14, 2017, 08:39:19 PM »
And yes, I was talking about that zip file. But if it is not needed, I should have a correct configuration.

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 19353
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #180 on: January 14, 2017, 08:44:09 PM »
It is a mismatch of the random numbers that is the more usual type of desync that is hard to track down, especially if it only happens every 10 minutes (meaning that each small change needs 10 minutes to be tested to see whether it makes a difference).

Either the cause of the desyncs on the Bridgerwater-Brunel server are different to those on a local server (which are still seriously problematic), or they are both related to the same thing, but for some reason causing a desync more quickly on the Bridgewater-Brunel server.

Offline Felix

  • *
  • Posts: 98
  • Languages: DE, EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #181 on: January 14, 2017, 08:46:58 PM »
Where are the numbers in the rand[] ("ssr" in the message) calculated? Somehow it is interesting, that on the server the value for rand[1] is identical to the seed ("rand" in the message), while on the client rand[0] and rand[1] are identical to the seed. The value for rand[0] matches the seed value from the client.

Offline TurfIt

  • Dev Team, Coder/patcher
  • Devotee
  • *
  • Posts: 1335
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #182 on: January 14, 2017, 09:07:01 PM »
ssr = sync_step randoms  karte_t::sync_step()
str = step randoms  karte_t::step()
These were just extra check points added to help track down desyncs 2 (3?4?) years ago, I'd rather have expected them to be have been removed once troubleshooting was over...

To make use of them, you'll want "server_frames_between_checks = 1" on the server. And then shuffle around the where the current state of the randoms are captured into the checklist. IIRC the previous desyncs were all in the step - str numbers, so ssr is just showing the state of the random numbers at the beginning and end of the sync_step. For the log posted, it would indicate the server is using a random number somewhere in a sync_stepped object that the client is not. You'd need to break up the sync step to be by object and add more capturing to try and use these to find the possible issue.

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 19353
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #183 on: January 14, 2017, 09:09:00 PM »
The "rand=..." is the random seed on the client and server respectively. I am not entirely sure what the st and ssr are (I did not write this code), nor quite where the long list of numbers come from. (Thank you TurfIt for answering whilst I was typing this reply - that is most helpful).

Normally, a desync of this sort is caused by divergence between server and client somewhere (it is usually extremely hard to find where), normally caused by some sort of indeterminism (which could be caused by undefined behaviour, incorrect implementation of multi-threading, a reference to an indeterminate variable or a failure to transmit all of the necessary information from the server to the client in the first place).

I usually find that the best way to fix this sort of problem is to try to narrow down the part of the code in which it occurs either by testing to see into which part of the code that it was introduced, or by selectively disabling parts of the code using preprocessor directives and seeing which parts need to be disabled in order for client and server to stay in sync.

Offline Felix

  • *
  • Posts: 98
  • Languages: DE, EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #184 on: January 14, 2017, 09:32:37 PM »
I tried something slightly different, by logging all calls of simrand. Sadly, the information is quite difficult to interpret. On a first look, it seems like karte_t::generate_passengers_and_mail gets called on the client at some point while it does not get called on the server at the same time. Form that point on, the random number seem to be out of sync.

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 19353
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #185 on: January 14, 2017, 10:30:06 PM »
Interesting. Was this a single-threaded or multi-threaded build?

Offline Felix

  • *
  • Posts: 98
  • Languages: DE, EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #186 on: January 14, 2017, 10:34:45 PM »
This was in a multithreaded build. I was not really able to replicate this in a singlethreaded build. The interpretation might be also plainly wrong.

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 19353
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #187 on: January 14, 2017, 10:41:53 PM »
This could be a problem with the multi-threaded passenger generation, in that case. A single threaded build on the loopback interface has been connected with me for some time.

How long did it take to desync in the multi-threaded build?

Edit: Could you try to see whether it desyncs with a multi-thread build with the preprocessor directive FORBID_MULTI_THREAD_PASSENGER_GENERATION_IN_NETWORK_MODE defined?

Offline Felix

  • *
  • Posts: 98
  • Languages: DE, EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #188 on: January 14, 2017, 11:20:19 PM »
With the bridgewater-brunel savegame I also had desyncs with singlethreaded builds, usually leading to an immediate crash of either the server or client.

Still trying with the flag, now.

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 19353
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #189 on: January 14, 2017, 11:23:45 PM »
Thank you - that is helpful. Did you get desyncs with the britain-3.sve file with single threaded mode? I could not get desyncs with that despite running it for about four hours this afternoon/evening in a single threaded build.

(The trick to increasing the efficiency of fixing this bug is to find a saved game that will reliably cause a desync quickly).

Offline Felix

  • *
  • Posts: 98
  • Languages: DE, EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #190 on: January 14, 2017, 11:26:55 PM »
A build with the flag set (-DFORBID_MULTI_THREAD_PASSENGER_GENERATION_IN_NETWORK_MODE) still gives me a immediate desync in connect.

Multithreading was accidentally disabled. So, also without I get a immediate disconnect with the britain-3 savegame.

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 19353
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #191 on: January 14, 2017, 11:28:27 PM »
With britain-3.sve or with the very similar but perhaps subtly different saved game saved from the Bridgewater-Brunel server?

Edit: I should note that I have been connected with that flag enabled since before I wrote the last message and and still connected now - on the loopback server.

Offline Felix

  • *
  • Posts: 98
  • Languages: DE, EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #192 on: January 14, 2017, 11:30:08 PM »
No trying with only the flag ...

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 19353
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #193 on: January 14, 2017, 11:31:01 PM »
Thank you.

Offline Felix

  • *
  • Posts: 98
  • Languages: DE, EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #194 on: January 14, 2017, 11:37:47 PM »
Same result :-(

If the britain-3 or the server's savegame is involved, I seem to get an immediate disconnect no matter what. With a copy of the server's savegame that I used locally for some hours, I only get a delayed disconnect after like 10 min.

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 19353
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #195 on: January 14, 2017, 11:43:10 PM »
I have to say, I am finding it exceedingly difficult to understand why you are getting a different result to that which I am getting. The only thing that I can think to suggest now is for you to try older versions to see where this problem first arose. If this involves going any further back than late December, this will get very complicated indeed because from about October to December, I was adding multi-threading features, which involved lots of commits adding, disabling, then re-enabling (often many times over) a set of about four or five independent sets of multi-threading code, so it will not be a simple matter of going backwards and finding a version in which a desync does not occur.

I do find it very perplexing that you are getting rapid desyncs in a single threaded build, however, which I have not had (with a multi-threaded build) when testing connecting a Linux machine to the Bridgewater-Brunel server. I really cannot understand this at all.

Offline Felix

  • *
  • Posts: 98
  • Languages: DE, EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #196 on: January 14, 2017, 11:52:40 PM »
I already noticed that it gets complicated before December. One interessting aspect is also, that it seems to be savegame related. With other savegames, I do not get the immediate disconnect.

Maybe, me or us are also overlooking something. Could my setup differ in any significant aspect?

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 19353
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #197 on: January 15, 2017, 12:03:46 AM »
I cannot think of anything configuration specific that could make a difference; but could you post your config.default file just in case?

Offline Felix

  • *
  • Posts: 98
  • Languages: DE, EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #198 on: January 15, 2017, 01:13:53 AM »
sure (I added the .txt extension to be able to attach it)

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 19353
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #199 on: January 15, 2017, 01:19:13 AM »
I cannot see anything in there that seems to be problematic.

I should say that I was just about to test whether I could reproduce your results under Linux using my NUC which runs Ubuntu, but that device failed (to the extent that I am now organising a warranty return) as I was doing that, so I am afraid that I will not be able to do any Linux testing myself for a few weeks until the replacement item is sent to me and I am able to set it up.

Edit It is rather a long shot, but do you think that you could try with SDL rather than SDL2 to see whether this makes any difference?

Offline Felix

  • *
  • Posts: 98
  • Languages: DE, EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #200 on: January 15, 2017, 01:31:01 AM »
Might be worth a try. SDL2 also has another issue ;-) When I resize the Window, the game crashes.

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 19353
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #201 on: January 15, 2017, 01:40:12 AM »
Thank you - do let me know how you get on.

I should say that the debug build Windows versions with the multi-threading of passenger generation disabled are still connected. I shall set up a release build to try overnight to see whether that makes any difference.

Offline Felix

  • *
  • Posts: 98
  • Languages: DE, EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #202 on: January 15, 2017, 01:54:47 AM »
The result with SDL 1 is pretty similar:

Code: [Select]
ERROR: route_t::intern_calc_route():    Problem with heuristic:  from 1021,1369,5 to (1036,1459,8) at 1022,1369, best = 1554, cost = 10, heur = 3340, dist = 96, turns = 3234

For help with this error or to file a bug report please see the Simutrans forum:
http://forum.simutrans.com
Warning: karte_t:::do_network_world_command:    skipping command due to checklist mismatch : sync_step=280 server=[ss=280 st=35 nfc=0 rand=786208633 halt=1 line=1 cnvy=1025 ssr=1005465115,1005465115,0,0,0,0,0,0 str=1005465115,1005465115,1005465115,1005465115,1005465115,1005465115,1005465115,786208633,786208633,786208633,786208633,786208633,786208633,1235700,105473,786208633 exr=0,0,0,0,0,0,0,0  executor=[ss=280 st=35 nfc=0 rand=4269245549 halt=1 line=1 cnvy=1025 ssr=2623138960,2623138960,0,0,0,0,0,0 str=2623138960,2623138960,2623138960,2623138960,2623138960,2623138960,2623138960,3881930485,3881930485,3881930485,3881930485,4269245549,4269245549,1235700,105473,3881930485 exr=0,0,0,0,0,0,0,0 
Warning: karte_t::network_disconnect(): Lost synchronisation with server. Random flags: 0
Warning: nwc_routesearch_t::reset:      all static variables are reset
Message: karte_t::reset_timer():        called, mode=$0
Segmentation fault

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 19353
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #203 on: January 15, 2017, 01:58:58 AM »
That is exceedingly odd. Are you able to check older versions to see when this fault was first introduced? A good start might be the 1st of January: after the implementation of all the multi-threading, but before some of the work that I have done this year.

Offline Felix

  • *
  • Posts: 98
  • Languages: DE, EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #204 on: January 15, 2017, 02:01:38 AM »
It looks like it is a configuration issue. The name the savegame uses for the pakset actually matches a different one (the one from the nightly builds page) in my setup while the client uses the custom build one. So it is likely caused by the pakset mismatch. The crashes are still valid issues. Sorry, for the wasted time :-(


This really solves the immediate desync :-/

Offline Felix

  • *
  • Posts: 98
  • Languages: DE, EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #205 on: January 15, 2017, 02:44:48 AM »
Sadly, connecting to bridgewater-brunel.me.uk still results in a desync, but the server also claims to be a different version (the commit id seems to not exist, though).

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 19353
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #206 on: January 15, 2017, 02:47:43 AM »
Thank you for testing this. Can you clarify the circumstances, if any, in which you now get (1) a crash; and (2) a desync using the loopback interface?

Also, I have not encountered this issue before with the name used by the saved game for the pakset causing desyncs, and I am not really sure why it would do this. Can you let me know more about how you traced the problem to this issue? Are you sure that it is the name itself causing it? It is hard to see any means by which this could happen.

As to the Bridgewater-Brunel server, I am having problems with getting the correct version to work on that: see here for an explanation, including a description of a very bizarre problem that I am currently unable to resolve, preventing me from having usable version numbers on this server.

Offline Felix

  • *
  • Posts: 98
  • Languages: DE, EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #207 on: January 15, 2017, 02:52:50 AM »
On the loopback interface I did not observe anymore desyncs with the latest version.

In my test setup the server was accidentally loading a different version of the pakset, which had the name expected by the savegame. The client was running with a newer version build from the sources. This caused an immediate desync right after the connect. I was setting everything on the command line to simplify testing. In game both versions of the paksets are unfortunately referenced by the same name, which made me to miss the error.

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 19353
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #208 on: January 15, 2017, 12:35:25 PM »
You had two different versions of the same pakset with the same name installed?

In nay event, running overnight with release builds on Windows with FORBID_MULTI_THREAD_PASSENGER_GENERATION_IN_NETWORK_MODE defined in the britain-3.sve, I get no desyncs either.

When you say that you get no desyncs with the latest version, is that with or without FORBID_MULTI_THREAD_PASSENGER_GENERATION_IN_NETWORK_MODE defined, and after how long a time of running is that?

Offline Felix

  • *
  • Posts: 98
  • Languages: DE, EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #209 on: January 15, 2017, 01:03:42 PM »
I had to versions of the same paksets within different folders.

The game was running with and without the flag and neither raised an immediate desync. I only ran the game for 15 min in both tests, though.