The International Simutrans Forum

 

Author Topic: Online checklist mismatches (was: Clients keep disconnecting)  (Read 7856 times)

0 Members and 1 Guest are viewing this topic.

Offline DrSuperGood

  • Dev Team
  • Devotee
  • *
  • Posts: 2720
  • Languages: EN
Online checklist mismatches (was: Clients keep disconnecting)
« on: January 08, 2016, 03:50:48 AM »
Quote
It can bounce around a bit, but should mostly track 0. When -'ve, if a command comes in from a player, you're at risk of a desync.
A desync because of this will show up as :
Surely your client should wait if it is running ahead of the server? I think that is how RTS games do it. They must use some sort of checkpoint system where the client refuses to execute passed a certain time frame when the client is no longer sure the server is still ahead.

To better describe it the server receives a user command, then assigns it a certain time slot. It then sends out a packet to all clients saying that between a start and end time these commands were issued at ceretain times and in a certain order. The client receives the packet and knows it can safely advance until the time specified by the server, after which it has to pause until another such command covering more time is issued. If there is a bad connection a client will get periodic stutter as the game pauses often to wait for packets and then has to advance faster than normal to catch up. Games like Warcraft III and StarCraft II go a step further by potentially pausing the game state to "wait" for a player if that player falls too far behind, that functionality is not appropriate for a game like Simutrans and instead the client running as fast as possible to catch up is probably more friendly.

You trade some input latency (a result of the checkpoints) for an absolutely stable server client system.


Edit: (TurfIt) split topic from http://forum.simutrans.com/index.php?topic=15109.0
« Last Edit: January 09, 2016, 11:12:56 PM by TurfIt »

Offline Dwachs

  • DevTeam, Coder/patcher
  • Administrator
  • *
  • Posts: 4593
  • Languages: EN, DE, AT
Re: Online checklist mismatches
« Reply #1 on: January 08, 2016, 10:54:00 AM »
I am not aware of any bugs you mention in these posts:
There are issues with the netcode. Specifically "send to depot" is notorious for causing out of sync problems. That said many of the issues should have been fixed.
Is the 'send to depot' bug fixed with the latest commits addressing uninitialized variables in vehicle routefinding?
Note: there are some bugs in 120.1.1 that will cause this. Some are fixed with 120.1.2, ....
Are there more known bugs that are not fixed in 120.1.2? I am more than happy to hunt them down.

Offline DrSuperGood

  • Dev Team
  • Devotee
  • *
  • Posts: 2720
  • Languages: EN
Re: Online checklist mismatches
« Reply #2 on: January 08, 2016, 04:15:24 PM »
Quote
Is the 'send to depot' bug fixed with the latest commits addressing uninitialized variables in vehicle routefinding?
I have no idea, I have not been following that bug. I only really knew about it from personal experience and some chat messages.

Offline Dwachs

  • DevTeam, Coder/patcher
  • Administrator
  • *
  • Posts: 4593
  • Languages: EN, DE, AT
Re: Online checklist mismatches
« Reply #3 on: January 09, 2016, 05:11:17 PM »
There is indeed a desyncing bug in the go-to-depot code.

Offline TurfIt

  • Dev Team, Coder/patcher
  • Devotee
  • *
  • Posts: 1326
Re: Online checklist mismatches
« Reply #4 on: January 09, 2016, 11:48:52 PM »
My observations with the go-to-depot are if a train is waiting at a signal to enter a tile, and the train occupying that tile is then told to goto depot, and it 'jumps' when doing so, then frequently on the client, the waiting train proceeds through the signal, the two trains overlap, and sometime later a checklist mismatch desync is triggered.  If I manually reconnect to the server immediately to see its state, I see that waiting train is actually still waiting for the depot bound one to clear. I've not been able to duplicate under controlled circumstances with a graphical server and client both running...

In general, it appears this, and several other issues all have block reservations in common as the desync culprit.

Phantom signal - using the track way removal tool to disconnect a piece of track from an adjacent one containing a signal sometimes results in the signal vanishing from view. It also does not show up as an object when clicking on the tile. Trains proceeding over the spot where the signal was enter a state where they reserve tile by tile rather than to the next signal. Often a save/reload clears this phantom signal. Bulldozing everything on the tile, and rebuilding the track results in the same behaviour returning. Building a signal on the tile is allowed, but it never clears (goes permissive). Deleting the signal does not fix things.

Trains running red signals - once a train is in a tile by tile reservation mode, it seems to ignore all further signals server side - sometimes client side.  You can watch it follow a train - it's always one reserved block back in the same signal section. Once it reaches the next station on its schedule, it returns to normal behaviour obeying signals. If the platform at the station is occupied, and it's a terminal station (same way in/out), then it will block the exit and deadlock the system.
Usually you don't see this deadlock client side, instead the train is correctly waiting at the signal. Reconnect to the server, and suddenly there's a train sitting where it's impossible to be except by running a red.

If a train has a path reserved from a signal to the next, and you delete the next signal (bulldoze) before it goes permissive, the train will enter the tile-by-tile mode. Client side after a couple blocks of this tile-by-tile, it reverts to signal-signal reservations. Server side it seems to frequently stay stuck in tile-by-tile, ultimately leading to the deadlock above.

Anything that results in a train jumping and redoing it's reservations is also likely to trigger a desync. Editing schedules, especially editing the line schedule is quite dangerous. If you add a new waypoint, quite often trains change their destination to the waypoint even when they're already beyond it in the schedule. The resultant jump when they turnaround results in behavior much like when they jump due to the goto depot command. Related, it would be nice if vehicles were smarter about line schedule changes so they didn't turn back... it's a pain to have to visit every single convoi in a line, and open its individual schedule to correct the current destination after you add/remove a waypoint... That would minimize jumping too...



Offline Dwachs

  • DevTeam, Coder/patcher
  • Administrator
  • *
  • Posts: 4593
  • Languages: EN, DE, AT
Re: Online checklist mismatches
« Reply #5 on: January 10, 2016, 10:46:10 AM »
The bug with the go-to-depot button was: immediately upon pressing the button, a route-search was triggered. This route-search immediately cleared reserved tiles and reserved stop positions. On all other clients the effects were only visible after the convoi in question got its new schedule. I tried to fix this with r7724-7726. The attempted fix was to trigger the route search on all clients synchronously. Changing the route-search to not change state of the convoi seems to be impossible as the route search for airplanes changes state of the convoi (flying vs on ground).


Will have to check the strange behavior you mention in connection with signals.

Offline Dwachs

  • DevTeam, Coder/patcher
  • Administrator
  • *
  • Posts: 4593
  • Languages: EN, DE, AT
Re: Online checklist mismatches
« Reply #6 on: January 10, 2016, 03:04:58 PM »
I could not reproduce phantom signals: If the signal object is gone, it is gone. The way is clean and free. I cannot imagine that this produces strange behavior in network mode.

I could reproduce the step-by-step reserving of trains if signals are removed. However, this behavior reverts back to signal-to-signal reservation at the tile that follows the tile, where the signal was removed.

Offline Ters

  • Coder/patcher
  • Devotee
  • *
  • Posts: 5531
  • Languages: EN, NO
Re: Online checklist mismatches
« Reply #7 on: January 10, 2016, 03:44:08 PM »
I have observed trains doing tile-by-tile reservations, although I don't remember the circumstances, since it has been a while now. It lasted quite a while, I think, probably until the next waypoint, or perhaps rather the next time route is calculated. Changing the next destination back-and-forth in the schedule would cure it.

Offline Dwachs

  • DevTeam, Coder/patcher
  • Administrator
  • *
  • Posts: 4593
  • Languages: EN, DE, AT
Re: Online checklist mismatches (was: Clients keep disconnecting)
« Reply #8 on: January 10, 2016, 05:35:18 PM »
I saw the step-by-step reservation but only triggered by removing signals. When the train arrives at the tile after the removed signal, it proceeds with signal-signal reservations. Then it never travels through red signals.

Maybe I miss something, but I would love to have a procedure to get step-by-step reservation over more than two tiles.

Offline Ters

  • Coder/patcher
  • Devotee
  • *
  • Posts: 5531
  • Languages: EN, NO
Re: Online checklist mismatches (was: Clients keep disconnecting)
« Reply #9 on: January 10, 2016, 06:11:18 PM »
I notice that when removing the signal the train is waiting at, the train starts moving, reserving tile-by-tile, but the train reserves its entire length. In the cases I remember, the train only reserved a single tile at the front. It may be a bug that has been fixed by now.

Offline Dwachs

  • DevTeam, Coder/patcher
  • Administrator
  • *
  • Posts: 4593
  • Languages: EN, DE, AT
Re: Online checklist mismatches (was: Clients keep disconnecting)
« Reply #10 on: January 10, 2016, 06:29:13 PM »
Removing the signal the train is waiting at leeds to this behavior if the train cannot reserve the next block. Confirmed.

Offline TurfIt

  • Dev Team, Coder/patcher
  • Devotee
  • *
  • Posts: 1326
Re: Online checklist mismatches (was: Clients keep disconnecting)
« Reply #11 on: January 10, 2016, 08:13:29 PM »
Removing the signal the train has reserved a path upto, but not yet reached results in the tile-by-tile behaviour if there's no more signals between it and the destination.

Connecting to a mulitplayer server and observing what's in the screenshot is far too common.

I did catch a train moving down the tracks ignoring all signals once. It ended up exactly as shown (this was a recreation). I did not however catch it originally enter the tile-by-tile mode. I do know track construction was performed along its route, hundreds of tiles away from where it finally ended up. The construction changed signals, but not the final track layout. i.e. The train did not have to re-search the route to continue.

I wish there was a nice procedure to induce this. I do note server and client quite often differ with one of them having the trains tile-by-tile, the other not.

Offline prissi

  • Developer
  • Administrator
  • *
  • Posts: 9518
  • Languages: De,EN,JP
Re: Online checklist mismatches (was: Clients keep disconnecting)
« Reply #12 on: January 10, 2016, 09:58:29 PM »
Hmm, there had been once code to unreserve for the train and trigger a new search.

Offline Ters

  • Coder/patcher
  • Devotee
  • *
  • Posts: 5531
  • Languages: EN, NO
Re: Online checklist mismatches (was: Clients keep disconnecting)
« Reply #13 on: January 10, 2016, 10:00:55 PM »
I did catch a train moving down the tracks ignoring all signals once. It ended up exactly as shown (this was a recreation). I did not however catch it originally enter the tile-by-tile mode.

I think my trains have gone completely "car-ish" like that on several occasions. The sad part is that I believe I saw it starting, but I apparently didn't investigate or report anything back then, and now I've forgotten the details.

Offline TurfIt

  • Dev Team, Coder/patcher
  • Devotee
  • *
  • Posts: 1326
Re: Online checklist mismatches (was: Clients keep disconnecting)
« Reply #14 on: January 11, 2016, 12:28:27 AM »
I just induced many trains to all enter the tile-by-tile / ignore all signals mode by using the signal dragging tool to change from 12 spacing to 8.

Offline DrSuperGood

  • Dev Team
  • Devotee
  • *
  • Posts: 2720
  • Languages: EN
Re: Online checklist mismatches (was: Clients keep disconnecting)
« Reply #15 on: January 11, 2016, 05:20:24 AM »
Quote
12 spacing to 8.
Could be the removal or addition of a signal at a certain stage in the journey. For example if the train is already interacting with a signal and it gets removed or if a train is passing through a block which gets a signal suddenly.

Offline Dwachs

  • DevTeam, Coder/patcher
  • Administrator
  • *
  • Posts: 4593
  • Languages: EN, DE, AT
Re: Online checklist mismatches (was: Clients keep disconnecting)
« Reply #16 on: January 11, 2016, 07:26:19 AM »
I did catch a train moving down the tracks ignoring all signals once. It ended up exactly as shown (this was a recreation). I did not however catch it originally enter the tile-by-tile mode. I do know track construction was performed along its route, hundreds of tiles away from where it finally ended up. The construction changed signals, but not the final track layout. i.e. The train did not have to re-search the route to continue.
This happens if the signals were constructed after the train entered the block up to the station. Combined with the step-by-step reservation if a signal at the end of the reserved block is removed, this can lead to the situation in the screenshot.

I think I have a patch for this, will post it tonight.

However, I do not understand how this would lead to desyncs. While this behavior is weird and annoying, it should occur synchronously on all clients/server.

Offline TurfIt

  • Dev Team, Coder/patcher
  • Devotee
  • *
  • Posts: 1326
Re: Online checklist mismatches (was: Clients keep disconnecting)
« Reply #17 on: January 11, 2016, 11:56:35 AM »
Putting the trains into tile-by-tile ignore reds mode with the signal dragging tool results in the pictured deadlock with trains that are many 10's of tiles away from the signal block at the station. This one does appears to occur synchronously, I presume there's more ways to induce that result in the desyncs. My suspicions still point towards modifying signals with the track removal tool.


Forgot the other desync that's repeatable: Using the raise/lower land tools is blocked on the server when you're bankrupt. The tool executes on the client however.

Offline Dwachs

  • DevTeam, Coder/patcher
  • Administrator
  • *
  • Posts: 4593
  • Languages: EN, DE, AT
Re: Online checklist mismatches (was: Clients keep disconnecting)
« Reply #18 on: January 11, 2016, 09:34:34 PM »
Tried to fix the desync with terraforming in r7730.

Comitted a fix to the train signalling bug in r7731. If a signal vanishes, the train will try to reserve the next block. If this fails the train will stop without signal. Maybe we can allow it to drive on up to the next crossing. Everything else would result in the deadlocks as mentioned above.

Please test and report back.

Offline TurfIt

  • Dev Team, Coder/patcher
  • Devotee
  • *
  • Posts: 1326
Re: Online checklist mismatches (was: Clients keep disconnecting)
« Reply #19 on: January 19, 2016, 07:57:15 PM »
Once again one of the game instances on my server has started throwing checklist mismatches immediately upon joining. And also again resaving the server.sve on a Windows client solves the problem.

The client was connected and idling for ~30 mins before the desync. No one joined, no commands issued. Rejoining gave the checklist mismatches immediately until the server save was replace with the resaved one. Restarting the server with the original save results in the mismatches continuing.

The troublesome .sve works with no issue with a Windows server (both graphics and posix) and client, all 32bit on 64bit OS. But the Windows client connecting to the BSD64 server disconnects immediately upon the first checklist check - 100% repeatable. Saves FWIW: desyn128_2.7z

Issues with BSD? 32 vs 64? Something else? Very frustrating...

EDIT:  BSD server was built DEBUG=3, OPTIMIZE=1.  Windows server and client are #DEBUG, OPTIMIZE=1. Changing the Windows server to DEBUG=3 results in the same checklist desync now...
« Last Edit: January 19, 2016, 08:09:36 PM by TurfIt »

Offline DrSuperGood

  • Dev Team
  • Devotee
  • *
  • Posts: 2720
  • Languages: EN
Re: Online checklist mismatches (was: Clients keep disconnecting)
« Reply #20 on: January 20, 2016, 12:32:09 AM »
Quote
EDIT:  BSD server was built DEBUG=3, OPTIMIZE=1.  Windows server and client are #DEBUG, OPTIMIZE=1. Changing the Windows server to DEBUG=3 results in the same checklist desync now...
As far as I could tell the DEBUG and OPTIMIZE compiler flags were mutually exclusive, especially as far as version logic goes.

It is possible this is a GCC bug as if you are building "optimized" then it is using very different logic from "un-optimized" to build with.

It would be very useful if servers/clients were able to perform an "autopsy" when they out of sync from a checklist mismatch. This would flag up which objects the client has out of synced with, possibly even what fields. Might be completely impractical to implement though and is not useful outside of debugging.

Offline Ters

  • Coder/patcher
  • Devotee
  • *
  • Posts: 5531
  • Languages: EN, NO
Re: Online checklist mismatches (was: Clients keep disconnecting)
« Reply #21 on: January 20, 2016, 06:06:14 AM »
As far as I could tell the DEBUG and OPTIMIZE compiler flags were mutually exclusive, especially as far as version logic goes.

It is possible this is a GCC bug as if you are building "optimized" then it is using very different logic from "un-optimized" to build with.

DEBUG=3 is mutually exclusive with OPTIMIZE, since it adds -O0 after OPTIMIZE's -O3. Lower DEBUG levels can coexist with OPTIMIZE, although DEBUG=2 does turn off inlining.

In addition to possible GCC bugs, we must also consider the possibility that some Simutrans code has crossed into undefined behavior territory.

Offline prissi

  • Developer
  • Administrator
  • *
  • Posts: 9518
  • Languages: De,EN,JP
Re: Online checklist mismatches (was: Clients keep disconnecting)
« Reply #22 on: January 20, 2016, 10:12:04 PM »
The only thing causing different logic would be a calculation only done for a debug message (or an assert). I though we ironed those out years ago :(

Offline Ters

  • Coder/patcher
  • Devotee
  • *
  • Posts: 5531
  • Languages: EN, NO
Re: Online checklist mismatches (was: Clients keep disconnecting)
« Reply #23 on: January 21, 2016, 05:54:05 AM »
The only thing causing different logic would be a calculation only done for a debug message (or an assert). I though we ironed those out years ago :(

Other possibilities are something like this (not exactly different logic):

Code: [Select]
#include <cstdio>

int main() {
int value;
printf("%i\n", value);
return 0;
}

For me, this prints 0 when compiled with -O3 and some other "random" but constant value when compiled with -g. GCC generates a warning about this, and the only such warning I get when building Simutrans now is from squirrel code, but there might be more stealthy ways of doing something similar.

Offline Dwachs

  • DevTeam, Coder/patcher
  • Administrator
  • *
  • Posts: 4593
  • Languages: EN, DE, AT
Re: Online checklist mismatches (was: Clients keep disconnecting)
« Reply #24 on: January 21, 2016, 07:37:51 AM »
@Ters: such errors can be found with valgrind. I am pretty sure that this kind of uninitialized variables does not happen anymore.

The cause could also be static variables: They are initialized, their value is not random, but their value depends on the savegame that was loaded before the current game.


Offline Ters

  • Coder/patcher
  • Devotee
  • *
  • Posts: 5531
  • Languages: EN, NO
Re: Online checklist mismatches (was: Clients keep disconnecting)
« Reply #25 on: January 21, 2016, 04:03:57 PM »
@Ters: such errors can be found with valgrind. I am pretty sure that this kind of uninitialized variables does not happen anymore.

The cause could also be static variables: They are initialized, their value is not random, but their value depends on the savegame that was loaded before the current game.

How often are networked games run in valgrind? And since there is a difference between optimized and debug builds, some value lingering in static fields seems less likely to be the reason for this.

Offline prissi

  • Developer
  • Administrator
  • *
  • Posts: 9518
  • Languages: De,EN,JP
Re: Online checklist mismatches (was: Clients keep disconnecting)
« Reply #26 on: January 21, 2016, 10:53:34 PM »
MSVC warns at compile time about all unititialised variable. Furthermore debug builds of simutrans with MSVC would report accessing unitialise variables. So I think that is not an issue, or at least it is one variable which is allocated.

Offline Ters

  • Coder/patcher
  • Devotee
  • *
  • Posts: 5531
  • Languages: EN, NO
Re: Online checklist mismatches (was: Clients keep disconnecting)
« Reply #27 on: January 22, 2016, 06:55:42 AM »
MSVC warns at compile time about all unititialised variable.
As I wrote: GCC also warns, and it does issue a warning when building Simutrans (although not in a part that seems relevant here). There used to be more, I think, so someone has been doing a good job weeding them out.

Furthermore debug builds of simutrans with MSVC would report accessing unitialise variables.
Exactly the kind of difference between release builds and debug builds I was talking about, although the "reporting" might just be that uninitialized variables are filled with an easily recognizable value when debugging.

So I think that is not an issue, or at least it is one variable which is allocated.
I wrote that if it was an uninitialized variable thing, it would be more hidden than in my simple example showing how release builds and debug builds can differ in behavior when invoking undefined behavior.

Not that this is to only possible reason, but I generally consider most things more likely than a compiler bug, which is the only other hypothesis on the table. Especially given how little we have to go on so far.

Offline prissi

  • Developer
  • Administrator
  • *
  • Posts: 9518
  • Languages: De,EN,JP
Re: Online checklist mismatches (was: Clients keep disconnecting)
« Reply #28 on: January 22, 2016, 04:19:16 PM »
Another thing is that allocated memory is zeroed in debug builds (or with a defined values) while it is random on release builds.

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Devotee
  • *
  • Posts: 18721
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Online checklist mismatches (was: Clients keep disconnecting)
« Reply #29 on: January 22, 2016, 10:34:30 PM »
These are painful to resolve, I know from experience. I took four months to find one once, with much help from Bernd Gabriel. Release/debug differences I have found are often down to undefined behaviour somewhere (undefined variables are usually easy to track down; these issues tend to be more subtle).

One (rather blunt) technique that I have found that does work to narrow the problem down a little is to disable parts of the code one by one while testing to see which area of the code contains the problem. (This is less effective, of course, when there are multiple and diverse causes). Logging all calls to simrand() can help, too: there is some code in Experimental to do this (enabled by a preprocessor definition) which you might want to look at to help with this. This can be useful in tracking down exactly where the random numbers diverge, which can help to narrow the problem down a little.

Offline TurfIt

  • Dev Team, Coder/patcher
  • Devotee
  • *
  • Posts: 1326
Re: Online checklist mismatches (was: Clients keep disconnecting)
« Reply #30 on: January 23, 2016, 06:13:09 AM »
Is there anything in r7731 that could've fixed this?

Current symptoms - using the save previously linked, and a posix debug server (both Windows and BSD), graphical debug and non-debug clients (Windows) desync immediately with 7730 and older, remain connected with 7731. Graphical servers (Windows), and non-debug posix servers (Windows and BSD) work with any version.

Resaving the game with any of the working configurations results in it working also with the posix debug servers (past experience is the problem will reoccur in several days if the game is left to run and with clients joining/leaving). Resaving the game with the posix debug server results in the problem persisting.

I somewhat expect this is just a fluke with the newer versions rearranging things enough to mask the problem... If 7731 hasn't actually fixed it, then I really don't know how to even start tracking this down; Attempting to add some debugging code will also likely change the .exes enough to mask it. But first, can anyone replicate the behaviour?


Offline DrSuperGood

  • Dev Team
  • Devotee
  • *
  • Posts: 2720
  • Languages: EN
Re: Online checklist mismatches (was: Clients keep disconnecting)
« Reply #31 on: January 23, 2016, 07:17:52 AM »
It does have to do with block reservations, so it is possible it was trying to touch invalid values before I guess (since the logic was not working as intended?). One also cannot rule out that that small change in logic has just spontaneously "fixed" the problem. Without knowing the exact cause there is no guarantee that it will not re-appear later with another change that spontaneously breaks it.

Offline Dwachs

  • DevTeam, Coder/patcher
  • Administrator
  • *
  • Posts: 4593
  • Languages: EN, DE, AT
Re: Online checklist mismatches (was: Clients keep disconnecting)
« Reply #32 on: January 23, 2016, 09:15:32 AM »
r7731 was targetted to cure the step-by-step reserving of track. It is just a change in reserving behavior.

However this change broke choose signals, i.e., trains waiting at choose signals treated the signal as normal signal and thus were not choosing free platforms. You might want to try out if the desync come back in r7742, which fixed this.

I could not reproduce the desyncs with graphic/no-graphic, optimized/non-optimized, gcc 4.6/clang 3.0 builds (with recent r7744).

Offline TurfIt

  • Dev Team, Coder/patcher
  • Devotee
  • *
  • Posts: 1326
Re: Online checklist mismatches (was: Clients keep disconnecting)
« Reply #33 on: January 23, 2016, 11:09:39 PM »
I'm aware 7731 wasn't trying to necessarily fix desyncs, but it's possible the game has a convoi doing a step-by-step somewhere (unlikely given no players had done actions for a quite a while {but then one player has his entire network jammed for a decade...}).

Using a posix debug server, game desyncs with a non-debug graphical client for 7720, 7723, 7727, 7729, 7730. Does not desync for 7731, 7732, 7736, 7740, 7746. 100% repeatable.
A debug client will sometimes desync, sometimes not. Graphic servers never have clients desyncing. Posix non-debug, also works.

A posix debug server, graphic non-debug client, built in mingw w/ gcc4.6.2 also works.  With 5.3.0 desyncs. The BSD servers were compiled with 4.9.4, and exhibit the desyncs.
Presumably the stone age 4.6 optimizer doesn't trip over whatever bad code we have... Or a gcc bug persisting through many very different versions... but bad Simutrans is more likely.

So, can anyone replicate the desyncs not using an ancient compiler?
(and note that after a client successfully stays joined, you have to revert to the original server13356-network.sve file since resaving it with a working server during the join process results in it continuing to work with the non-working setup. i.e. use the original save everytime starting the server, and restart the server after every join.)

Offline Dwachs

  • DevTeam, Coder/patcher
  • Administrator
  • *
  • Posts: 4593
  • Languages: EN, DE, AT
Re: Online checklist mismatches (was: Clients keep disconnecting)
« Reply #34 on: January 24, 2016, 02:03:17 PM »
Could replicate the immediate desync locally with r7730

-server: posix, non-debug, optimized, gcc-4.6
-client: graphical, non-debug, optimized, gcc-4.9

Edit: r7744 does not give the immediate desync.

Not sure what this story tells us.

Edit2: Using the new gcc compiler option -fsanitize=undefined to enable UndefinedBehaviorSanitizer does not give any clue. The reported undefined behavior is in simgraph16.....

Edit3: The good news are that this desync is completely reproducible, and even adding my desync-debugging code does not alter the outcome. It seems to be indeed related to trains and reserving: there is a train that goes from loading to driving state on the server, but to waiting on the client.
« Last Edit: January 24, 2016, 06:08:30 PM by Dwachs »

Offline TurfIt

  • Dev Team, Coder/patcher
  • Devotee
  • *
  • Posts: 1326
Re: Online checklist mismatches (was: Clients keep disconnecting)
« Reply #35 on: January 24, 2016, 06:16:59 PM »
Thanks for pursuing. It's nice to get beyond the 'it's just me' phase...
waitlock issue? I've noticed trains getting stuck in the 'schedule changing' mode quite often. Seems if their associated line schedule is changed while they're in a station loading they get stuck for an extended time.

Offline Dwachs

  • DevTeam, Coder/patcher
  • Administrator
  • *
  • Posts: 4593
  • Languages: EN, DE, AT
Re: Online checklist mismatches (was: Clients keep disconnecting)
« Reply #36 on: January 24, 2016, 06:42:15 PM »
It appears that r7731 (and then r7742) indeed fixed this desync by accident.

There was a train with route_index == route.get_count(), which called the block_reserver in can_enter_tile due to next_block+1<route_index. However, block_reserver immediately returned false, which was ignored, then cnv->set_next_stop_index was called, but next_signal and next_crossing were not assigned in block_reserver -> undefined value propagating.

valgrind would have found this, but I could not get to run it on my machine, the savegame was just too heavy.

Offline TurfIt

  • Dev Team, Coder/patcher
  • Devotee
  • *
  • Posts: 1326
Re: Online checklist mismatches (was: Clients keep disconnecting)
« Reply #37 on: January 25, 2016, 12:00:05 AM »
That's great news then.
Is your desync-debugging code something in a shareable state?

Isn't a train with route_index==route.get_count() calling the block_reserver an error in the first place?

Offline Dwachs

  • DevTeam, Coder/patcher
  • Administrator
  • *
  • Posts: 4593
  • Languages: EN, DE, AT
Re: Online checklist mismatches (was: Clients keep disconnecting)
« Reply #38 on: January 25, 2016, 08:09:15 AM »
I will upload a patch in the evening.

Isn't a train with route_index==route.get_count() calling the block_reserver an error in the first place?
Don't know. That would need further investigation.

Offline Dwachs

  • DevTeam, Coder/patcher
  • Administrator
  • *
  • Posts: 4593
  • Languages: EN, DE, AT
Re: Online checklist mismatches (was: Clients keep disconnecting)
« Reply #39 on: January 25, 2016, 09:05:08 PM »
Here is a patch for debugging desyncs. Both client and server have to be compiled with the patch enabled (-DDEBUG_DESYNC, there are also new files network/network_debug.*).

Upon desync, client and server exchange data (like calls to simrand etc). The first difference found will be reported in a new window inside the client and written to desync.log.

With
Code: [Select]
network_debug_add(format, ....)
new debug messages can be added (both on client and server) with a printf-like syntax. These message strings make up the data that is automatically compared. Usually several runs of adding debug messages & running until desync are necessary. I was lucky that the desync did not went away while adding the debug code.

Offline prissi

  • Developer
  • Administrator
  • *
  • Posts: 9518
  • Languages: De,EN,JP
Re: Online checklist mismatches (was: Clients keep disconnecting)
« Reply #40 on: January 25, 2016, 11:55:58 PM »
I think this woulb be a very useful addition. On might rather call it NETWORK_DBG() to be in place with the "normal" DEBUG_DBG(). Then it can be always there but only does something on real network debug build.

Offline TurfIt

  • Dev Team, Coder/patcher
  • Devotee
  • *
  • Posts: 1326
Re: Online checklist mismatches (was: Clients keep disconnecting)
« Reply #41 on: January 26, 2016, 01:16:21 AM »
That's a very nice debugging tool. Hopefully never needed again!

Offline Isaac.Eiland-Hall us

  • Benevolent Dictator
  • Administrator
  • *
  • Posts: 3651
  • PanamaCityPC.com/support/
    • Facebook Profile
  • Languages: EN
Re: Online checklist mismatches (was: Clients keep disconnecting)
« Reply #42 on: January 26, 2016, 01:59:33 AM »
Hopefully never needed again!

I know you already qualified it with "hopefully", but that still sounds like famous last words. :D