News:

SimuTranslator
Make Simutrans speak your language.

[11.26] Frequent Desynchs in Server Game

Started by Sarlock, April 26, 2014, 07:11:49 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

jamespetts

Rather embarrassingly, the apparent desynchronisation in the log extracts above was caused by an error in the code adding these items to the checklist. 11.28 transpired to have some fairly serious bugs the cause of which are being investigated with the result that that release has been withdrawn for the present, and the server reverted to 11.27.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Morelook

James / TurfIt :

I have just spot a minor bug in path_explorer.cc which I wonder if you could kindly help to fix, since I no longer have the Simutrans Exp repository set up on my PC anymore.

In line 1407 of path_explorer.cc :

// should take into account the iterations above
iterations += (uint32)working_halt_count + ( inbound_connections->get_total_member_count() << 1 );

iterations should be changed to iterations_processed instead.

Thank you very much in advance  ;)

Sarlock

#37
James: Do we have an old save game from around 15-20 years ago?  It might be interesting to load up an 1810 game for a few hours and see if we get any desynchs.  It might make an interesting comparison platform.  It really does seem to be getting increasingly worse over the past few days.

I also noticed that the number of convoys currently on the map has dropped from 12400 to down around 11100.  This is from the retirement of a lot of ship lines (I've retired over 500 ships myself) and yet the frequency of desynchs seems to be increasing.  I really do suspect some connection to the addition of a large volume of trains to the map over the past week or two, since we hit 1829 (let's recall that we had a really smooth day just before we went train mad in mid 1829).
Current projects: Pak128 Trees, blender graphics

jamespetts

Morelook - thank you very much indeed for the bug fix. I have just applied this to the 11.x branch. I do not have time to test now, but I will test when I get a chance later this evening.

Sarlock - I was thinking of a test such as that, albeit not on the live server, but on a server running on my local PC just for testing; if the above fix does not solve the problems, then that will be tried, along with running old (e.g. 11.22 and 11.23) versions to see whether any recent changes have caused the problem.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

ӔO

personally, I suspect there is some issue with the pax routing, as removing unused stations that are connected by foot to another station that is in use can cause desyncs.

My Sketchup open project sources
various projects rolled up: http://dl.dropbox.com/u/17111233/Roll_up.rar

Colour safe chart:

Morelook

#40
Sorry for not making it clear, James, but my minor bugfix above has no relation to the desync problems. But thanks a lot anyway for fixing it on my behalf.

jamespetts

Thank you very much for clarifying that. May I ask what effects that that bug would have?
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Sarlock

#42
Log attached, highlights for each desynch below:

#1, editing train schedule

ERROR: float32e8_t::operator / (const float32e8_t & x) const: Division by zero in: 108000 / -0
For help with this error or to file a bug report please see the Simutrans forum:
http://forum.simutrans.com
ERROR: float32e8_t::operator / (const float32e8_t & x) const: Division by zero in: 2644.85349 / -0
For help with this error or to file a bug report please see the Simutrans forum:
http://forum.simutrans.com
ERROR: float32e8_t::operator / (const float32e8_t & x) const: Division by zero in: 108000 / -0
For help with this error or to file a bug report please see the Simutrans forum:
http://forum.simutrans.com
ERROR: float32e8_t::operator / (const float32e8_t & x) const: Division by zero in: 11414.2806 / -0
For help with this error or to file a bug report please see the Simutrans forum:
http://forum.simutrans.com
ERROR: float32e8_t::operator / (const float32e8_t & x) const: Division by zero in: 108000 / -0
For help with this error or to file a bug report please see the Simutrans forum:
http://forum.simutrans.com
ERROR: float32e8_t::operator / (const float32e8_t & x) const: Division by zero in: 2644.85349 / -0
For help with this error or to file a bug report please see the Simutrans forum:
http://forum.simutrans.com
ERROR: float32e8_t::operator / (const float32e8_t & x) const: Division by zero in: 108000 / -0
For help with this error or to file a bug report please see the Simutrans forum:
http://forum.simutrans.com
Warning: karte_t::interactive: sync_step=38296  server=[rand=1810728635 halt=2049 line=1025 cnvy=8193 processing=0 current_compartment=7 paths_available=1 refresh_completed=1 refresh_requested=0 current_phase=0 phase_couner=0 iterations=0] client=[rand=4245363771 halt=2049 line=1025 cnvy=8193 processing=0 current_compartment=7 paths_available=1 refresh_completed=1 refresh_requested=0 current_phase=0 phase_couner=0 iterations=0]
Warning: karte_t::interactive: disconnecting due to checklist mismatch
Warning: karte_t::network_disconnect(): Lost synchronisation with server. Random flags: 0
Warning: nwc_routesearch_t::reset: all static variables are reset


#2, was building a station, 2 players online.
Message: network_command_t::rdwr: read packet_id=8, client_id=0
Warning: nwc_tool_t::rdwr: rdwr id=8 client=2 plnr=3 pos=6693,1148,-13 wkzid=4115 defpar=woodenPlatformOverallTrainStop init=0 flags=0
Warning: network_check_activity(): received cmd id=8 nwc_tool_t from socket[588]
Message: nwc_tool_t::pre_execute: append sync_step=39668 wkz=4115 work
Message: void path_explorer_t::compartment_t::step(): Phase 5 started
Message: void path_explorer_t::compartment_t::step(): Phase 5 started
Message: void path_explorer_t::compartment_t::step(): Phase 5 started
Warning: karte_t::interactive: sync_step=39648  server=[rand=3005003861 halt=2049 line=1025 cnvy=8193 processing=1 current_compartment=0 paths_available=1 refresh_completed=0 refresh_requested=1 current_phase=5 phase_couner=0 iterations=0] client=[rand=3005003861 halt=2049 line=1025 cnvy=8193 processing=1 current_compartment=0 paths_available=1 refresh_completed=0 refresh_requested=1 current_phase=5 phase_couner=0 iterations=0]
Message: void path_explorer_t::compartment_t::step(): Phase 5 started
Message: network_command_t::rdwr: read packet_id=9, client_id=0
Warning: network_check_activity(): received cmd id=9 nwc_check_t from socket[588]
Warning: NWC_CHECK: time difference to server 0
Warning: karte_t::interactive: skipping command due to checklist mismatch : sync_step=39654 server=[rand=260017048 halt=2049 line=1025 cnvy=8193 processing=1 current_compartment=0 paths_available=1 refresh_completed=0 refresh_requested=1 current_phase=5 phase_couner=0 iterations=0] executor=[rand=3948677212 halt=2049 line=1025 cnvy=8193 processing=1 current_compartment=0 paths_available=1 refresh_completed=0 refresh_requested=1 current_phase=5 phase_couner=0 iterations=0]
Warning: karte_t::network_disconnect(): Lost synchronisation with server. Random flags: 0
Warning: nwc_routesearch_t::reset: all static variables are reset


#3 and #4 were socket disconnections.

#5, building a station

Warning: nwc_tool_t::rdwr: rdwr id=8 client=0 plnr=3 pos=3012,1249,-13 wkzid=4115 defpar=WoodenPlatformTrainStop init=0 flags=0
Message: packet_t::send: sent 86 bytes to socket[704]; id=8, size=86
Message: network_command_t::rdwr: read packet_id=8, client_id=0
Warning: nwc_tool_t::rdwr: rdwr id=8 client=1 plnr=3 pos=3012,1249,-13 wkzid=4115 defpar=WoodenPlatformTrainStop init=0 flags=0
Warning: network_check_activity(): received cmd id=8 nwc_tool_t from socket[704]
Message: nwc_tool_t::pre_execute: append sync_step=42919 wkz=4115 work
Message: void path_explorer_t::compartment_t::step(): Phase 5 started
Message: network_command_t::rdwr: write packet_id=8, client_id=1
Warning: nwc_tool_t::rdwr: rdwr id=8 client=0 plnr=3 pos=3012,1250,-13 wkzid=4115 defpar=WoodenPlatformTrainStop init=0 flags=0
Message: packet_t::send: sent 86 bytes to socket[704]; id=8, size=86
Message: network_command_t::rdwr: read packet_id=8, client_id=0
Warning: nwc_tool_t::rdwr: rdwr id=8 client=1 plnr=3 pos=3012,1250,-13 wkzid=4115 defpar=WoodenPlatformTrainStop init=0 flags=0
Warning: network_check_activity(): received cmd id=8 nwc_tool_t from socket[704]
Message: nwc_tool_t::pre_execute: append sync_step=42922 wkz=4115 work
Message: void path_explorer_t::compartment_t::step(): Phase 5 started
Message: void path_explorer_t::compartment_t::step(): Phase 5 started
Message: void path_explorer_t::compartment_t::step(): Phase 5 started
Warning: karte_t::interactive: skipping command due to checklist mismatch : sync_step=42905 server=[rand=3842983830 halt=2049 line=1025 cnvy=8193 processing=1 current_compartment=0 paths_available=1 refresh_completed=0 refresh_requested=1 current_phase=5 phase_couner=0 iterations=0] executor=[rand=3752133987 halt=2049 line=1025 cnvy=8193 processing=1 current_compartment=0 paths_available=1 refresh_completed=0 refresh_requested=1 current_phase=5 phase_couner=0 iterations=0]
Warning: karte_t::network_disconnect(): Lost synchronisation with server. Random flags: 0
Warning: nwc_routesearch_t::reset: all static variables are reset


Log:
Simutrans-Experimental Logs May 6, 2014
Current projects: Pak128 Trees, blender graphics

Morelook

Quote from: jamespetts on May 06, 2014, 10:10:45 PM
Thank you very much for clarifying that. May I ask what effects that that bug would have?
The bug only led to some more iterations being performed per step than desirable, in the phase 5 of path explorer. It should not have caused the recent desyncs.

Sarlock

More logs:

I was the only player on the server for #1-#12:
1: Checklist:
2: Socket closed: Making a new schedule
3: Checklist: Continuing to make new schedule
4: Checklist: No immediate actions (just observing)
5: Checklist: No immediate actions - following a train
6: Checklist: No immediate actions - following a train
7: Checklist: No immediate actions - following a train
8: Checklist: No immediate actions - following a train
9: Checklist: No immediate actions - following a train
10: Checklist: No immediate actions (just observing)
11: Checklist: No immediate actions (just observing)
12: Socket closed: Changing name of line
2nd player joined
13: Checklist: Observing map - only I was kicked
14: Checklist: Following train - was on for ~5 game hours without disconnect! - only I was kicked
15: Checklist: Following train, happened shortly after #14 - only I was kicked

#1 to #13 happened with quite some frequency, every 20-30 game minutes.  Then I went for ~5 game hours without an interruption until #14.

Logs:

Simutrans-Experimental Logs May 7, 2014
Current projects: Pak128 Trees, blender graphics

jamespetts

Thank you very much for those logs. There is no clear pattern immediately apparent to me: however, what is clear is that the logs show that, in each case, the only value out of sync is the random number seed: all of the path explorer values remain in sync. That might well tend to suggest that the path explorer is not itself responsible, although whether there are other issues relating to routing (issues with point to point timings, for example) remains to be determined.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

ӔO

My Sketchup open project sources
various projects rolled up: http://dl.dropbox.com/u/17111233/Roll_up.rar

Colour safe chart:

jamespetts

Last night, I replaced the executable with one that TurfIt had compiled from the same code on the -master branch (11.29) and restarted the server. I should be very grateful for any reports on whether the frequent desyncs still occur now.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

ӔO

I have not desync'd with the new server build thus far.

But now there is quite a lot of waiting time when someone connects.
My Sketchup open project sources
various projects rolled up: http://dl.dropbox.com/u/17111233/Roll_up.rar

Colour safe chart:

jamespetts

The waiting time is due to enabling logging on the server, which had previously been disabled. I should be grateful for a further evaluation of the position as to whether there have been any more desyncs later in the day, as simply restarting the server seems in the past to have caused a temporary reduction in desyncs.

Thank you for your information.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Sarlock

Server was really good the last day or so since the reset.  Then I was building a new rail extension tonight and when I went to build the line I had several checklist mismatches while trying to put the line together.  It took me 4 reconnections to get the line done, each time desynching after only 30 seconds or a minute with a checklist mismatch.  Not sure if it's coincidental, and the desynchs just happen to start at this time, but it's worth noting.  I've consistently had problems with desynchs making lines, so this is nothing new... but the period of stability before that is important (that said, we made other new lines in the past 24 hours without much issue).  Maybe the reset works only for a certain period of time and then something starts to fall out of synch.
Current projects: Pak128 Trees, blender graphics

jamespetts

Thank you for that. Can you post your logs for those incidents?
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

greenling

Hello Jamespetts & Sarlock
I have Think after an have Find the Problems of the Frequent Desynchs in Server Game.
The Problems of the Frequent Desynchs in the Server Game it the Telephone Line to your house.
It The Telephonline to long or to old, then sinking the Datatransfer.
Is the phone line not in the ground but hangs on a telephone pole, then the data transmission is also bad.
There is the telephone line with the weather getting into movement.
In because United States and Canda it was cheaper to a telephone network to build.  :o :o
Opening hours 20:00 - 23:00
(In Night from friday on saturday and saturday on sunday it possibly that i be keep longer in Forum.)
I am The Assistant from Pakfilearcheologist!
Working on a big Problem!

jamespetts

The problem is not the quality of the internet connexion - a poor connexion cannot cause checklist mismatches: only a software error can cause that.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Sarlock

Unfortunately I've lost those logs for that session, as the logs reset to an empty file when you reload the client.  I'll attempt another line today and see what we get and save those logs.
Current projects: Pak128 Trees, blender graphics

jamespetts

Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

TurfIt

After reconnecting from a desync, if you sent a chat message, that would allow comparing the server and client logs easier. i.e. It would mark the relevant location in the log with "desync #1", "#2", etc.

Sarlock

Played for about 3-4 hours today and here are the logs:

May 10, 2014 Logs

After each reconnection from desynch, I put out a chat message in game stating the nature of the desynch and the corresponding line number in my client end logs.  Except the last one, of course, since I didn't reconnect after it.

#1 socket disconnect (no server side message sent after reconnection)
#2 checklist mismatch
#3 checklist mismatch
#4 socket disconnect
#5 socket disconnect

The server has been much more stable lately, 5 (and only 2 of the checklist variety) disconnections in 3-4 hours is far, far better than what we were getting last week.
Current projects: Pak128 Trees, blender graphics

TurfIt

Do the socket desyncs occur after you've been connected stably for a while? Or only right after connecting?  I've only ever experienced them when connecting, and it appears the server is extremely intolerant of receiving any messages not from the connecting player.

Unfortunately 2 checklist desyncs in 3-4 hours of play makes it rather impossible to troubleshoot effectively.

Sarlock

#59
The socket desynchs were after quite some time connected.  That last one was just while I was chatting with James.

And yes, 2 in 3-4 hours is hard to troubleshoot... but also indicative that the new server executable compiling made a big improvement.

If I get another batch of them (like last night) I'll upload logs.

EDIT: James I see you said something about a server crash in the chat logs after I left.  Was this why I was dropped from the server and the cause of the socket disconnect?
Current projects: Pak128 Trees, blender graphics

ӔO

#60
more logs https://dl.dropboxusercontent.com/u/17111233/11.30_debug_log.rar

my desyncs were rather frequent.
My Sketchup open project sources
various projects rolled up: http://dl.dropbox.com/u/17111233/Roll_up.rar

Colour safe chart:

Sarlock

Checklist desynchs from today:

May 12, 2014 Logs

I made comments in game chat after I reconnected from these desynchs to highlight the area where it occurred (except for last one, since I didn't reconnect).

Extra log info is hopefully helpful... as well as removing a lot of the superfluous junk comments, makes it easier to read.
Current projects: Pak128 Trees, blender graphics

ӔO

My Sketchup open project sources
various projects rolled up: http://dl.dropbox.com/u/17111233/Roll_up.rar

Colour safe chart:

Sarlock

I've noticed checklist mismatch desynchs happening more frequently in the past 24-48 hours.
Current projects: Pak128 Trees, blender graphics

ӔO

My Sketchup open project sources
various projects rolled up: http://dl.dropbox.com/u/17111233/Roll_up.rar

Colour safe chart:

jamespetts

I have restarted with TurfIt's build to see whether that makes any difference.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Sarlock

Lots of checklist mismatch desynchs this evening, especially towards the end of my session.  Was almost unplayable at the end; desynchs were within a minute or two of connecting.

May 15, 2014 Logs
Current projects: Pak128 Trees, blender graphics

Sarlock

Connected a short while ago and played for about 20 minutes with just one checklist mismatch desynch.  Then another hit after that time period and I reconnected and instantly desynched again.  And connected and desynched again.  And again.  All of them are checklist mismatches.  Perhaps this might be more useful than other experiences as these are occurring almost instantly after a connection occurs.  Comparing these logs to the server logs might actually discover something useful.

May 19, 2014 Logs

I've given up trying to connect for now, I'll try again later... good luck!
Current projects: Pak128 Trees, blender graphics

speedbus

Quote from: Sarlock on May 19, 2014, 06:27:32 PM
[...] And connected and desynched again.  And again.  All of them are checklist mismatches.

This is also true for me. Tried it this morning and about half an hour ago. Last log file attached.

jamespetts

I seem to be able to reproduce this effect now - I do not have time to investigate this evening, but it is interesting that it is so soon.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.