The International Simutrans Forum

 

Author Topic: Desync issue (devel-new-2) with Linux Server/Windows client  (Read 29736 times)

0 Members and 1 Guest are viewing this topic.

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 18763
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #210 on: January 15, 2017, 01:07:14 PM »
Hmm - I think that we need to find a map that triggers the desync with the multi-theraded passenger generation more quickly, as this must have been why I missed it in December when I was trying to make it work.

Offline Felix

  • *
  • Posts: 98
  • Languages: DE, EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #211 on: January 15, 2017, 01:11:23 PM »
I still need to ensure, that this is not also related to the configuration issue. Will let the game run for a longer time, now, without the flag and on loopback.

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 18763
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #212 on: January 15, 2017, 01:13:11 PM »
Thank you - that is helpful.

Offline Felix

  • *
  • Posts: 98
  • Languages: DE, EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #213 on: January 15, 2017, 01:25:37 PM »
I got some other stuff to do anyway. I will give you an intermediate result in an hour.

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 18763
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #214 on: January 15, 2017, 01:27:49 PM »
Splendid, thank you.

Offline Felix

  • *
  • Posts: 98
  • Languages: DE, EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #215 on: January 15, 2017, 02:31:44 PM »
Takes much longer, now:

Code: [Select]
ERROR: route_t::intern_calc_route(): Problem with heuristic:  from 1323,2161,2 to (1313,2153,2) at 1323,2160, best = 201, cost = 10, heur = 260, dist = 13, turns = 237

For help with this error or to file a bug report please see the Simutrans forum:
http://forum.simutrans.com
Warning: karte_t:::do_network_world_command: sync_step=98240  server=[ss=98240 st=12280 nfc=0 rand=218288632 halt=1 line=1 cnvy=1025 ssr=2422921232,2422921232,0,0,0,0,0,0 str=2422921232,2422921232,2422921232,2422921232,2422921232,218288632,218288632,218288632,218288632,218288632,218288632,218288632,218288632,3688248822,681432194,218288632 exr=0,0,0,0,0,0,0,0  client=[ss=98240 st=12280 nfc=0 rand=2852418295 halt=1 line=1 cnvy=1025 ssr=2422921232,2422921232,0,0,0,0,0,0 str=2422921232,2422921232,2422921232,2422921232,2422921232,2852418295,2852418295,2852418295,2852418295,2852418295,2852418295,2852418295,2852418295,3688248822,681432194,2852418295 exr=0,0,0,0,0,0,0,0 
Warning: karte_t:::do_network_world_command: disconnecting due to checklist mismatch
Warning: karte_t::network_disconnect(): Lost synchronisation with server. Random flags: 0
Warning: nwc_routesearch_t::reset: all static variables are reset
Message: karte_t::reset_timer(): called, mode=$0
World finished ...
Show banner ...
Message: karte_t::reset_timer(): called, mode=$0
ERROR: route_t::intern_calc_route(): Problem with heuristic:  from 1336,2179,5 to (1337,2182,5) at 1337,2182, best = 70, cost = 70, heur = 700, dist = 0, turns = 630

For help with this error or to file a bug report please see the Simutrans forum:
http://forum.simutrans.com

This was the latest version, without the flag. Please note, that I modified the build of the pakset to get rid of the offending routemaster (dot-file).

I will now retry with a build with the flag.

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 18763
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #216 on: January 15, 2017, 02:37:29 PM »
Thank you - that is helpful. I wonder whether testing with an optimised build might reproduce the problem more quickly?

Offline Felix

  • *
  • Posts: 98
  • Languages: DE, EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #217 on: January 15, 2017, 03:02:20 PM »
With the flag, it seems to desync faster, but that might be a coincidence.

Code: [Select]
Message: haltestelle_t::reserve_position(): failed for gr=1526,2901, cnv=123
Warning: karte_t:::do_network_world_command: sync_step=8544  server=[ss=8544 st=1068 nfc=0 rand=2793569096 halt=1 line=1 cnvy=1025 ssr=2130149267,345191757,0,0,0,0,0,0 str=345191757,345191757,345191757,345191757,345191757,345191757,345191757,2793569096,2793569096,2793569096,2793569096,2793569096,2793569096,1224606908,104526904,2793569096 exr=0,0,0,0,0,0,0,0  client=[ss=8544 st=1068 nfc=0 rand=2793569096 halt=1 line=1 cnvy=1025 ssr=2130149267,345191757,0,0,0,0,0,0 str=345191757,345191757,345191757,345191757,345191757,345191757,345191757,2793569096,2793569096,2793569096,2793569096,2793569096,2793569096,1224606908,104526904,2793569096 exr=0,0,0,0,0,0,0,0 
Message: haltestelle_t::reserve_position(): failed for gr=24,1784, cnv=339
Message: haltestelle_t::reserve_position(): failed for gr=1526,2901, cnv=123
Message: haltestelle_t::reserve_position(): failed for gr=24,1784, cnv=339
Message: haltestelle_t::reserve_position(): failed for gr=1526,2901, cnv=123
Message: haltestelle_t::reserve_position(): failed for gr=24,1784, cnv=339
Message: haltestelle_t::reserve_position(): failed for gr=741,2278, cnv=170
Message: haltestelle_t::reserve_position(): failed for gr=1526,2901, cnv=123
Message: network_command_t::rdwr: read packet_id=9, client_id=0
Warning: network_check_activity(): received cmd id=9 nwc_check_t from socket[10]
Warning: NWC_CHECK: time difference to server 0
Message: network_world_command_t::execute: do_command 9 at sync_step 8577 world now at 8572
Message: haltestelle_t::reserve_position(): failed for gr=24,1784, cnv=339
Message: haltestelle_t::reserve_position(): failed for gr=1526,2901, cnv=123
Warning: karte_t:::do_network_world_command: sync_step=8576  server=[ss=8576 st=1072 nfc=0 rand=2202801700 halt=1 line=1 cnvy=1025 ssr=4174812530,4174812530,0,0,0,0,0,0 str=4174812530,4174812530,4174812530,4174812530,4174812530,4174812530,4174812530,2202801700,2202801700,2202801700,2202801700,2202801700,2202801700,1229549888,104948820,2202801700 exr=0,0,0,0,0,0,0,0  client=[ss=8576 st=1072 nfc=0 rand=3894976443 halt=1 line=1 cnvy=1025 ssr=3344621506,3344621506,0,0,0,0,0,0 str=3344621506,3344621506,3344621506,3344621506,3344621506,3344621506,3344621506,3894976443,3894976443,3894976443,3894976443,3894976443,3894976443,1229549888,104948820,3894976443 exr=0,0,0,0,0,0,0,0 
Warning: karte_t:::do_network_world_command: disconnecting due to checklist mismatch
Warning: karte_t::network_disconnect(): Lost synchronisation with server. Random flags: 0
Warning: nwc_routesearch_t::reset: all static variables are reset
Message: karte_t::reset_timer(): called, mode=$0
World finished ...
Show banner ...
Message: karte_t::reset_timer(): called, mode=$0
Message: haltestelle_t::reserve_position(): failed for gr=24,1784, cnv=339
Message: haltestelle_t::reserve_position(): failed for gr=1526,2901, cnv=123

Please note, that I removed the log messages regarding heuristic errors to make the log more readable.

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 18763
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #218 on: January 15, 2017, 03:20:21 PM »
Thank you for testing that. I am in the process of trying to find a map with which it will desync more quickly, but have encountered thread deadlocks when using that map that I am spending a long time now trying to resolve.

Offline Felix

  • *
  • Posts: 98
  • Languages: DE, EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #219 on: January 15, 2017, 03:38:16 PM »
Currently running with "-debug 5" instead of "-debug 3". Already got to 10000 steps without desync.

Good luck with the deadlocks!

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 18763
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #220 on: January 15, 2017, 03:39:49 PM »
Thank you!

Offline Felix

  • *
  • Posts: 98
  • Languages: DE, EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #221 on: January 15, 2017, 04:43:29 PM »
Running for almost an hour, now, and at 130000 steps without desync (with flag, and "-debug 5").

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 18763
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #222 on: January 15, 2017, 04:46:36 PM »
Excellent, thank you for that. The problem does appear to be in the multi-threaded passenger generation. The trouble is finding out where. I have just managed to fix the multi-threading deadlock, however, which might help in being able to use saved games that might test this more thoroughly.

Offline Felix

  • *
  • Posts: 98
  • Languages: DE, EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #223 on: January 15, 2017, 05:21:23 PM »
Ok, I will switch to a build without the flag, but with the setting "-debug 5", now. Lets see what happens.

Offline Felix

  • *
  • Posts: 98
  • Languages: DE, EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #224 on: January 15, 2017, 05:50:42 PM »
The desync seems more related to not using the command line argument "-debug 5" than the compile time flag. The version without the flag is already running for 20 min (36000 steps) without issues, too.

Offline Felix

  • *
  • Posts: 98
  • Languages: DE, EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #225 on: January 15, 2017, 06:31:33 PM »
With the above settings it takes really long to desync (~1hour):

client (simu.log):
Code: [Select]
Warning: NWC_CHECK: time difference to server 0
Message: network_world_command_t::execute: do_command 9 at sync_step 104001 world now at 103996
Message: haltestelle_t::reserve_position(): failed for gr=3,1913, cnv=194
Warning: karte_t:::do_network_world_command: sync_step=104000  server=[ss=104000 st=13000 nfc=0 rand=2001339770 halt=1 line=1 cnvy=1025 ssr=2001339770,2001339770,0,0,0,0,0,0 str=2001339770,2001339770,2001339770,2001339770,2001339770,2001339770,2001339770,2001339770,2001339770,2001339770,2001339770,2001339770,2001339770,2307280173,1296641621,2001339770 exr=0,0,0,0,0,0,0,0  client=[ss=104000 st=13000 nfc=0 rand=2001339770 halt=1 line=1 cnvy=1025 ssr=2001339770,2001339770,0,0,0,0,0,0 str=2001339770,2001339770,2001339770,2001339770,2001339770,2001339770,2001339770,2001339770,2001339770,2001339770,2001339770,2001339770,2001339770,2307280173,1296641621,2001339770 exr=0,0,0,0,0,0,0,0 
Message: haltestelle_t::reserve_position(): failed for gr=3,1913, cnv=194
Message: haltestelle_t::reserve_position(): failed for gr=741,2278, cnv=170
Message: vehicle_t::remove_stale_cargo(): called
Message: vehicle_t::remove_stale_cargo(): called
Message: vehicle_t::remove_stale_cargo(): called
Message: vehicle_t::remove_stale_cargo(): called
Message: vehicle_t::remove_stale_cargo(): called
Message: vehicle_t::remove_stale_cargo(): called
Message: route_t::calc_route(): No route from 1213,2305 to 1249,2324 found
ERROR: rail_vehicle_t::activate_choose_signal(): could not reserved route after find_route!
For help with this error or to file a bug report please see the Simutrans forum:
http://forum.simutrans.com
Message: haltestelle_t::reserve_position(): failed for gr=3,1913, cnv=194
Message: haltestelle_t::reserve_position(): failed for gr=1526,2901, cnv=123
Message: haltestelle_t::reserve_position(): failed for gr=3,1913, cnv=194
Message: network_command_t::rdwr: read packet_id=9, client_id=0
Warning: network_check_activity(): received cmd id=9 nwc_check_t from socket[10]
Warning: NWC_CHECK: time difference to server 0
Message: network_world_command_t::execute: do_command 9 at sync_step 104033 world now at 104028
Message: haltestelle_t::reserve_position(): failed for gr=3,1913, cnv=194
Warning: karte_t:::do_network_world_command: sync_step=104032  server=[ss=104032 st=13004 nfc=0 rand=1664088824 halt=1 line=1 cnvy=1025 ssr=1664088824,1664088824,0,0,0,0,0,0 str=1664088824,1664088824,1664088824,1664088824,1664088824,1664088824,1664088824,1664088824,1664088824,1664088824,1664088824,1664088824,1664088824,2312225405,1297063697,1664088824 exr=0,0,0,0,0,0,0,0  client=[ss=104032 st=13004 nfc=0 rand=1664088824 halt=1 line=1 cnvy=1025 ssr=1664088824,1664088824,0,0,0,0,0,0 str=1664088824,1664088824,1664088824,1664088824,1664088824,1664088824,1664088824,1664088824,1664088824,1664088824,1664088824,1664088824,1664088824,2312225405,1297063697,1664088824 exr=0,0,0,0,0,0,0,0 
Message: haltestelle_t::reserve_position(): failed for gr=3,1913, cnv=194
Message: haltestelle_t::reserve_position(): failed for gr=3,1913, cnv=194
Message: haltestelle_t::reserve_position(): failed for gr=3,1913, cnv=194
Message: network_command_t::rdwr: read packet_id=9, client_id=0
Warning: network_check_activity(): received cmd id=9 nwc_check_t from socket[10]
Warning: NWC_CHECK: time difference to server 33
Message: network_world_command_t::execute: do_command 9 at sync_step 104065 world now at 104059
Message: haltestelle_t::reserve_position(): failed for gr=3,1913, cnv=194
Warning: karte_t:::do_network_world_command: sync_step=104064  server=[ss=104064 st=13008 nfc=0 rand=3200587526 halt=1 line=1 cnvy=1025 ssr=76729760,3560747292,0,0,0,0,0,0 str=3560747292,3560747292,3560747292,3560747292,3560747292,3560747292,3560747292,3560747292,3560747292,3560747292,3560747292,3200587526,3200587526,2317170637,1297485773,3560747292 exr=0,0,0,0,0,0,0,0  client=[ss=104064 st=13008 nfc=0 rand=1355265147 halt=1 line=1 cnvy=1025 ssr=600745941,862477084,0,0,0,0,0,0 str=862477084,862477084,862477084,862477084,862477084,862477084,862477084,862477084,862477084,862477084,862477084,1355265147,1355265147,2317170637,1297485773,862477084 exr=0,0,0,0,0,0,0,0 
Warning: karte_t:::do_network_world_command: disconnecting due to checklist mismatch
Warning: karte_t::network_disconnect(): Lost synchronisation with server. Random flags: 0
Warning: nwc_routesearch_t::reset: all static variables are reset
Message: karte_t::reset_timer(): called, mode=$0
World finished ...

Server (console):
Code: [Select]
or help with this error or to file a bug report please see the Simutrans forum:
http://forum.simutrans.com
ERROR: rail_vehicle_t::activate_choose_signal():        could not reserved route after find_route!
For help with this error or to file a bug report please see the Simutrans forum:
http://forum.simutrans.com
ERROR: rail_vehicle_t::activate_choose_signal():        could not reserved route after find_route!
For help with this error or to file a bug report please see the Simutrans forum:
http://forum.simutrans.com
Message: network_command_t::rdwr:       write packet_id=9, client_id=0
Message: packet_t::send:        sent 169 bytes to socket[5]; id=9, size=169
ERROR: rail_vehicle_t::activate_choose_signal():        could not reserved route after find_route!
For help with this error or to file a bug report please see the Simutrans forum:
http://forum.simutrans.com
ERROR: rail_vehicle_t::activate_choose_signal():        could not reserved route after find_route!
For help with this error or to file a bug report please see the Simutrans forum:
http://forum.simutrans.com
Message: network_command_t::rdwr:       write packet_id=9, client_id=0
Message: packet_t::send:        sent 169 bytes to socket[5]; id=9, size=169
ERROR: rail_vehicle_t::activate_choose_signal():        could not reserved route after find_route!
For help with this error or to file a bug report please see the Simutrans forum:
http://forum.simutrans.com
Message: network_command_t::rdwr:       write packet_id=9, client_id=0
Message: packet_t::send:        sent 169 bytes to socket[5]; id=9, size=169
Message: network_command_t::rdwr:       write packet_id=9, client_id=0
Message: packet_t::send:        sent 169 bytes to socket[5]; id=9, size=169
Warning: network_receive_data:  connection [5] already closed
Message: socket_list_t::remove_client:  remove client socket[5]
Warning: __ChatLog__:   disconnect,2,0.0.0.0,Client#2
Message: network_command_t::rdwr:       write packet_id=8, client_id=0
Warning: nwc_tool_t::rdwr:      rdwr id=8 client=0 plnr=255 pos=koord3d invalid tool_id=8224 defpar=32768,Client#2 has left. init=1 flags=0
Message: network_command_t::rdwr:       write packet_id=8, client_id=0
Warning: nwc_tool_t::rdwr:      rdwr id=8 client=0 plnr=255 pos=koord3d invalid tool_id=8224 defpar=32768,Now 0 clients connected. init=1 flags=0
Warning: nwc_tool_t::clone:     send sync_steps=0  tool=8224 init
Message: network_command_t::rdwr:       write packet_id=8, client_id=0
Warning: nwc_tool_t::rdwr:      rdwr id=8 client=0 plnr=255 pos=koord3d invalid tool_id=8224 defpar=32768,Client#2 has left. init=1 flags=0
Warning: nwc_tool_t::clone:     send sync_steps=13009  tool=8224 init
Message: network_command_t::rdwr:       write packet_id=8, client_id=0
Warning: nwc_tool_t::rdwr:      rdwr id=8 client=0 plnr=255 pos=koord3d invalid tool_id=8224 defpar=32768,Now 0 clients connected. init=1 flags=0
ERROR: rail_vehicle_t::activate_choose_signal():        could not reserved route after find_route!
For help with this error or to file a bug report please see the Simutrans forum:
http://forum.simutrans.com
Message: network_command_t::rdwr:       write packet_id=9, client_id=0
Message: network_command_t::rdwr:       write packet_id=9, client_id=0
Message: network_command_t::rdwr:       write packet_id=9, client_id=0
ERROR: rail_vehicle_t::activate_choose_signal():        could not reserved route after find_route!
For help with this error or to file a bug report please see the Simutrans forum:
http://forum.simutrans.com
Message: network_command_t::rdwr:       write packet_id=9, client_id=0

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 18763
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #226 on: January 15, 2017, 07:29:25 PM »
It looks as though we will need to find a map that reproduces this more quickly. Unfortunately, it seems that I have not fixed the deadlock after all, and I am still working on that.

Edit: I think that I have now fixed the deadlock, and I am having trouble reproducing the desync on even a very dense and well developed map. Can you try again with the latest commit to see whether this makes any difference?

If there is a desync associated with the passenger generation, it must be a very rare one given the difficulties that we are having reproducing it.

Edit 2: The desync seems to occur a little faster with this saved game, but it still takes a long time.

Edit 3: Going back to the original topic, recompiling a fresh game on the Bridgewater-Brunel server and downloading the pakset from that server to make sure that the paksets were identical, connecting to that server results in a desync within <10 seconds of unpausing after connecting. It seems rather unlikely that this could be the same issue as produces the desync on even very complex maps only after about 20 minutes. There seems to be some additional problem, but it is not clear what it is. Felix - are you able to connect to the Bridgewater-Brunel server without desyncing more or less immediately?
« Last Edit: January 15, 2017, 10:30:19 PM by jamespetts »

Offline Felix

  • *
  • Posts: 98
  • Languages: DE, EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #227 on: January 15, 2017, 11:53:37 PM »
Yes, I also get the immediate desync. Nevertheless, this might be related to a version mismatch. I used the latest revision from git. I will to more tests in the evening.

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 18763
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #228 on: January 15, 2017, 11:57:43 PM »
Can you check whether the server listing registers your pakset as matching or non-matching using the in-game network browser? For reasons already explained, I have not been able to get consistent version numbers for the executable on the server yet, but it should be the latest version from Github currently.

Edit: I have pushed a small fix for the multi-threaded passenger generation. I do not know whether this will affect synchronisation, but would you be able to re-test?

Edit 2: Having tested again, it still desyncs after a while with multi-threaded passenger generation enabled when using the loopback interface. Are you able to log the random calls again and see whether the problem is consistently that there is an extra call to generate_passengers_and_mail() on the client compared with the server?
« Last Edit: January 16, 2017, 12:16:10 PM by jamespetts »

Offline Felix

  • *
  • Posts: 98
  • Languages: DE, EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #229 on: January 16, 2017, 07:41:08 PM »
I would not trust the information from that analysis too much. As said, the data was very difficult to interpret.

With multithreaded passenger generation enabled, the client still desync immediately after the connect. I will now try without.

Compiled without multithreaded passenger generation enabled, I get the same desync.
« Last Edit: January 16, 2017, 07:54:16 PM by Felix »

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 18763
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #230 on: January 16, 2017, 09:05:42 PM »
So it seems that there are two different causes of desyncs here. How did you analyse the data such as to give rise to the analysis relating to passenger generation? Since tests demonstrate that disabling multi-threaded passenger generation does in fact result in no desyncs when using the loopback interface, it does suggest that this analysis has some merit.

Offline Felix

  • *
  • Posts: 98
  • Languages: DE, EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #231 on: January 16, 2017, 09:14:01 PM »
I just printed a log message for each call to the simrand function including the caller and later compared the log from the server and the client. Sadly, multithreading makes it difficult to actually map calls. To be useful, one would probably need to improve the approach based on a better understanding of how randomness is handled by the game. I think, the rand array used by the network check is doing something like this, but I do not yet fully understand it.

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 18763
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #232 on: January 16, 2017, 10:03:10 PM »
Yes - and that array was written by the Standard developers; Standard does not use any multi-threading in the actual simulation code (only in the loading/saving and in the graphics), so it will not have been designed to be used for multi-threading purposes in any event.

As to the random number generator, of all of the parts of the simulation that are multi-threaded, the only part where threads other than the main thread access the random number generator is the passenger generation section. In order to do this in a way that is deterministic between client and server, each thread generates its random number seed from its thread number (each thread being assigned a thread number on creation).  The random number generator seed is then stored as thread local variable. The main thread's random number generator seed is saved with the saved game and transmitted over the network so as to remain in sync between the server and client. Every call to simrand() (but not to sim_async_rand(), which is used for things such as graphical effects that need not be in sync) both produces a random number and changes the seed. I am not sure exactly what the logging system logs - I suspect that it may log the state of the seed over time.

Every few steps, the server and client compare their random number seeds. If they do not match, the client is disconnected from the game: it is this that we usually term a "desync" (i.e., the client and server have got out of synchronisation with one another resulting in a forced disconnexion). This can happen in a number of ways, all hard to trace because we can only detect the remote consequence of the problem rather than the problem itself. It might be caused by failing to transmit all the necessary data; it might be caused by non-deterministic multi-threading code; it might be caused by undefined behaviour somewhere in the code (e.g. taking the value of an uninitialised variable); it might be caused by something indeterministic between platforms (the entire physics engine needed re-writing once to use a special float class made using only int data after it was found that native float types are not deterministic accross platforms in order to mitigate this problem); or it might be caused by reference to some variable that will differ between server and client, such as using pointers as random number seeds or as keys in a hashtable (as in the built-in ptrhashtable class, which cannot be used where the order of iteration is critical to be deterministic between client and server) or using a machine's local time (e.g. dr_get_time()) rather than the game's internal time (get_zeit_ms()).

I suspect that the passenger generation desync is caused by some non-deterministic artefact in the multi-threading code, although quite what it is I have yet no idea. If it consistently manifests itself by causing the generate_passengers_and_mail() to be called a different number of times on the server and the client, then I should strongly suspect that the problem is somewhere in step_passengers_and_mail_threaded (which is the code that determines the number of times that each thread calls generate_passengers_and_mail(), each instance being called in a separate thread), but, having spent some time looking at this yesterday, I cannot immediately see what the problem is. I did fix an incidental bug that probably had only a very small effect (the mechanism being self-correcting as to the number of passengers generated from step to step), which would cause too many passengers to be generated on an individual step, but this has not affected the desync issue.

The desync caused when connecting a Windows client to a Linux server I initially thought was caused by a problem of the platform type, but if you on a Linux client are getting the same problem connecting to the Bridgewater-Brunel server, then I am very confused as to what the problem might be. Client and server do not need to have identical simuconf.tab files (etc.) as these configuration settings are saved with the saved game and transmitted from server to client when the client first joins (precisely in order to avoid that sort of problem), and although a substantive difference in pakset (e.g. a vehicle in one pakset being faster or more powerful than the vehicle with the same name in the other) could easily cause a divergence between server and client, a mere difference in version number in and of itself (caused by the issues described on the other thread to which I linked earlier) or difference in pakset name cannot, so far as I can tell, do any such thing.

Edit: There is a method (enabled by defining DEBUG_SIMRAND_CALLS) to log calls to simrand() and print the most recent of them them in the in-game message windows of both server and client as soon as a desync happens, but this appears not to be working properly at present. I will see about trying to restore it to working order to help the process.

Edit 2: I have now fixed DEBUG_SIMRAND_CALLS - it turns out that an important line had been commented out.

Edit 3: Unfortunately, this does not seem to be reliable enough to be used in a complex game as it causes crashes.
« Last Edit: January 16, 2017, 11:07:02 PM by jamespetts »

Offline Felix

  • *
  • Posts: 98
  • Languages: DE, EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #233 on: January 17, 2017, 08:59:16 PM »
Thank you very much for the extensive explanation. I will definitely continue to investigate the issue. Sadly, this will take quite a while, as my spare time is currently rather limited.

On the weekend, I already tried to restore the debug functionality for the random generation, but that code seems to be affected by significant bit rot. Next weekend, I will probably try again.

The problem is likely not platform related. But it is definitely affected by the specific save game. An interesting experiment might be to run bridgewater-brunel with a fresh save game. Might at least give some new insight.

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 18763
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #234 on: January 17, 2017, 09:08:29 PM »
Thank you for that suggestion, and for all your work so far: it is most appreciated. I am currently in the process of trying to restart the Bridgewater-Brunel server with the 1971 version of Rollermaterial's map to see whether this makes any difference.

Edit: Having done this, I still get a desync - I should be interested in whether you still get a desync with this server, and also with server.exp.simutrans.com.

Incidentally, would it interfere with the investigations that you plan to carry out in this regard if I were to start integrating some of the translation patches from Standard?
« Last Edit: January 17, 2017, 09:21:02 PM by jamespetts »

Offline Felix

  • *
  • Posts: 98
  • Languages: DE, EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #235 on: January 17, 2017, 09:48:19 PM »
Don't worry to integrate anything! First, I will need to understand the code, anyway, and merging commits should not be to much of a problem.

I am currently building the latest version and will then try to connect.

Edit 1: Ok, sadly, with the same result. But, at least, now, we know that it is not the map itself.

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 18763
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #236 on: January 17, 2017, 09:52:55 PM »
Thank you - that is helpful. The latest changes from Standard change the names of quite a few files/variables to make the code more accessible to those who do not speak German, although I notice that you speak German in any case.

Offline Felix

  • *
  • Posts: 98
  • Languages: DE, EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #237 on: January 17, 2017, 09:54:36 PM »
Yes, the German in the code will not negatively affect me ;-) Still, it feels strange to have variable and function names in German.

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 18763
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #238 on: January 17, 2017, 10:06:30 PM »
Having undertaken some testing in step_passengers_and_mail_threaded using the debugger to try to find the cause of the infrequent desync reproducible on the loopback interface, I cannot find anything resembling an anomaly so far.

Offline Felix

  • *
  • Posts: 98
  • Languages: DE, EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #239 on: January 17, 2017, 10:14:27 PM »
The infrequent desyncs will be significantly more difficult to track down.

The immediate desync seems to happen at the first check. Something is already wrong right at the start. Another minor observation is, that the reported time difference between server and client continuously increases, while on my local setup it usually also starts with a significant difference, but quickly decreases. Not sure, if this has any significance at all.

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 18763
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #240 on: January 17, 2017, 10:19:20 PM »
The infrequent desyncs will be significantly more difficult to track down.

The immediate desync seems to happen at the first check. Something is already wrong right at the start. Another minor observation is, that the reported time difference between server and client continuously increases, while on my local setup it usually also starts with a significant difference, but quickly decreases. Not sure, if this has any significance at all.

The first check? That is interesting. That suggests some problem with data transmission (i.e. the load/save routines) rather than with the running simulation code. Is the instant desync reporting divergence in the random numbers?

As to the time difference, the code for this is unchanged from Standard; so far as I am aware, the idea is that the client and server try to run in time with one another. The client getting behind the server will cause input lag but will not cause desyncs, while the client getting ahead of the server will cause desyncs (which is why the system will always try to keep the client slightly behind the server according to the simuconf.tab setting server_frames_behind). The commonest cause of the client getting behind the server by a significant amount is the client running on a slower machine than the server. May I ask what your system specifications are?

Offline Felix

  • *
  • Posts: 98
  • Languages: DE, EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #241 on: January 17, 2017, 10:25:46 PM »
The disconnect happens at the first check appearing in the log. It is the usual difference in random number seeds.

I am running the game on an Intel Core i7-4800MQ at 2.70GHz with 32 GB RAM. The operating system is a pretty up to date 64-Bit Gentoo Linux. The machine should not be part of the problem. This is not anymore the one from years ago, when I hat issues to keep up with the multiplayer game ;-)

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 18763
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #242 on: January 17, 2017, 10:28:28 PM »
Hmm - that looks like a more powerful machine than mine, an Intel i7 950 (overclocked to 4.2Ghz) with 12Gb of RAM, so that is probably not the issue. If you are falling behind the server, it is not clear to me why in those circumstances - unless perhaps it is because you are running a non-optimised build and the server is (as it indeed is) running an optimised build? The difference between the two in performance is very large.

Offline Felix

  • *
  • Posts: 98
  • Languages: DE, EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #243 on: January 17, 2017, 10:30:34 PM »
That is something I can quickly test out. But, I would assume, the initial lag is just an artifact cause by some underlying cause.

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 18763
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #244 on: January 17, 2017, 10:32:03 PM »
That is something I can quickly test out. But, I would assume, the initial lag is just an artifact cause by some underlying cause.

That is also a possibility - I should be interested in the results of your tests. In theory, Simutrans is supposed to be able to be compiled with -O3.