The International Simutrans Forum

 

Author Topic: "Lost synchronisation with server" report thread  (Read 557 times)

0 Members and 1 Guest are viewing this topic.

Offline freddyhayward

  • Devotee
  • *
  • Posts: 437
  • Languages: EN
"Lost synchronisation with server" report thread
« on: September 24, 2020, 08:48:53 AM »
The purpose of this thread is to allow users to report any losses of synchronisation without needing to locate or create an issue-specific thread. I would be grateful if James could pin this (or a similar thread).

How to generate checklist mismatch messages
1) Run simutrans-extended with the options (at least on linux, can someone let me know about Windows?) " -debug 2 -log simu.log" to generate checklist mismatch messages when they occur.
1a) If you are running a server, run it with server_frames_between_checks = 0 in simuconf.tab to ensure that mismatches are detected as soon as they occur.
2) Your client will pause upon a loss of synchronisation. If you can, please keep it paused and open! There might be further useful information hidden in the game state.
3) After you lose synchronisation, search for "disconnecting due to checklist mismatch" and post the message below here. The message should look something like this image (credit to Phystam):


How to investigate convoy mismatches
1) When individual convoys go out of sync, this will manifest in mismatching debug sums, shown here:


2) Open a calculator and divide the difference of the right-hand sum by the difference of the left-hand sum to get the ID of the convoy. If the numbers don't divide evenly, there might be multiple convoys involved, or there might be a problem with the logging system.


2) On your client, search for the ID found in the previous step by opening the vehicle list (Shift-V), click the checkbox to enable filter, click "Settings" to open the filter window, click the "Filter names:" checkbox and type in the ID. You may need to do this for multiple companies before finding it, so start with the largest companies first to speed up the process.


Additional notes
* Please let me know if I should add any more information or clarification.
* I'm keeping a tally of the types of mismatches here: https://docs.google.com/spreadsheets/d/1Jwa-5G6aXfkrCrPYLQvDbYt7LqYMgYapW8hyOF9IB3s/edit?usp=sharing. There's no detailed information, but it gives a general idea as to where they most frequently occur.
« Last Edit: September 28, 2020, 06:49:09 AM by freddyhayward »

Offline Phystam

  • Devotee
  • *
  • Posts: 495
  • Pak256.Ex developer
    • Pak256 wiki page
  • Languages: JP, EN, EO
Re: "Lost synchronisation with server" report thread
« Reply #1 on: September 25, 2020, 03:29:48 PM »
sum[0] and sum[2] have differences.

Offline Phystam

  • Devotee
  • *
  • Posts: 495
  • Pak256.Ex developer
    • Pak256 wiki page
  • Languages: JP, EN, EO
Re: "Lost synchronisation with server" report thread
« Reply #2 on: September 25, 2020, 05:50:39 PM »
And double post... This is from a participant in my server. He reported that there are "no route" messages because of low axle load just after disconnection.

Offline freddyhayward

  • Devotee
  • *
  • Posts: 437
  • Languages: EN
Re: "Lost synchronisation with server" report thread
« Reply #3 on: September 28, 2020, 06:17:45 AM »
The first convoy identified under the new debug sum system: a train departing from a station after a signal displays clear. It's unclear what the actual problem is, but hopefully a noticeable pattern will emerge in time.

Offline Matthew

  • *
  • Posts: 419
    • Japan Railway Journal
  • Languages: EN, some ZH, DE & SQ
Re: "Lost synchronisation with server" report thread
« Reply #4 on: September 28, 2020, 11:49:02 AM »
Freddy, thank you for reworking the debug sums system to get more useful results. I take it that this can be combined with the new pause-on-desync feature to identify the convoy that caused that desync (where that is the cause). Could you please post a simple explanation of how to join the dots in the log file?

Offline freddyhayward

  • Devotee
  • *
  • Posts: 437
  • Languages: EN
Re: "Lost synchronisation with server" report thread
« Reply #5 on: September 28, 2020, 11:51:06 AM »
Freddy, thank you for reworking the debug sums system to get more useful results. I take it that this can be combined with the new pause-on-desync feature to identify the convoy that caused that desync (where that is the cause). Could you please post a simple explanation of how to join the dots in the log file?
I have edited this into the original post. Please let me know whether anything there needs clarification.

Offline Matthew

  • *
  • Posts: 419
    • Japan Railway Journal
  • Languages: EN, some ZH, DE & SQ
Re: "Lost synchronisation with server" report thread
« Reply #6 on: September 28, 2020, 01:33:32 PM »
I have edited this into the original post. Please let me know whether anything there needs clarification.

Thank you! That looks crystal clear, but the proof of the pudding is in the eating, which will happen when I desync.

Offline jamespetts

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 20274
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: "Lost synchronisation with server" report thread
« Reply #7 on: October 02, 2020, 11:18:22 PM »
I should note that I have just incorporated some fixes by Ceeac to possible undefined behaviour. These fixes should appear in to-morrow's nightly build. It is possible that these may affect loss of synchronisation errors.

Offline Phystam

  • Devotee
  • *
  • Posts: 495
  • Pak256.Ex developer
    • Pak256 wiki page
  • Languages: JP, EN, EO
Re: "Lost synchronisation with server" report thread
« Reply #8 on: October 03, 2020, 05:15:24 PM »
How to output the log file (Windows10):
1) Make a shortcut for Simutrans-Extended.
2) Open the shortcut property and edit this section as the following picture:

Offline jamespetts

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 20274
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: "Lost synchronisation with server" report thread
« Reply #9 on: October 18, 2020, 12:51:17 PM »
Thank you very much to Freddy for compiling the spreadsheet of loss of synchronisation causes. As will be appreciated, the less frequent the loss of synchronisation, the more difficult that it is to test (as there is no reliable way to know whether any given change fixed the problem if it occurs only infrequently without waiting an extremely long time). Also, the less frequent the loss of synchronisation, the less important that remedying the problem is.

Nonetheless, it is worthwhile to try to remedy these if possible, especially the most common types. The most common types so far seem to be convoy movement based losses of synchronisation (debug sums) and halt based losses of synchronisation (rands[23]).

As to the latter, the code in the haltestelle_t::step() function most likely to show a divergence of random numbers is check_transferring_cargoes, which, when dealing with passengers, invokes the RNG by generating local pedestrians whenever passengers are released from their transferring state.

However, this means that we have very little idea of where the problem is originating, as its expression in this place could mean an origin in a vast number of different places in the code.

We already have a system to check part f this, being debug sums 6 and 7 for checking the number of transferring cargoes before and after the passenger and mail generation have run.

May I suggest that future loss of synchronisation logging should explicitly check whether there is any divergence in debug sums 6 and 7 when recording a loss of synchronisation at rands[23]?