Author Topic: Desync issue (devel-new-2) with Linux Server/Windows client  (Read 11435 times)

0 Members and 1 Guest are viewing this topic.

Offline Vladki

Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #280 on: March 27, 2017, 07:26:05 PM »
I'm afraid that this change (commit 7d09ea4d9d1f18a9f9b237f8a7167259eb8f4c6f) has broken compilation on linux:

Code: [Select]
===> CXX obj/leitung2.cc
g++ -std=gnu++11 -O -DNDEBUG -DMULTI_THREAD -DREVISION="7d09ea4" -Wall -W -Wcast-qual -Wpointer-arith -Wcast-align -DUSE_C -fno-delete-null-pointer-checks -fno-strict-aliasing  -I/usr/include/SDL -D_GNU_SOURCE=1 -D_REENTRANT -DCOLOUR_DEPTH=16 -c -MMD -o build/default/obj/leitung2.o obj/leitung2.cc
In file included from obj/../vehicle/../boden/grund.h:19:0,
                 from obj/../vehicle/../simplan.h:12,
                 from obj/../vehicle/../simworld.h:34,
                 from obj/../vehicle/simvehicle.h:18,
                 from obj/../vehicle/simroadtraffic.h:15,
                 from obj/../simcity.h:22,
                 from obj/leitung2.h:16,
                 from obj/leitung2.cc:17:
obj/../vehicle/../boden/wege/weg.h:60:41: warning: type qualifiers ignored on function return type [-Wignored-qualifiers]
  static const uint32 get_all_ways_count();
                                         ^
In file included from obj/leitung2.cc:23:0:
obj/../simfab.h:131:32: warning: type qualifiers ignored on function return type [-Wignored-qualifiers]
  const sint32 get_in_transit() const { return statistics[0][FAB_GOODS_TRANSIT]; }
                                ^
obj/leitung2.cc:395:44: warning: unused parameter ‘dummy’ [-Wunused-parameter]
 void leitung_t::info(cbuffer_t & buf, bool dummy) const
                                            ^
obj/leitung2.cc: In member function ‘virtual void leitung_t::rdwr(loadsave_t*)’:
obj/leitung2.cc:457:27: error: cast from ‘powernet_t*’ to ‘uint32 {aka unsigned int}’ loses precision [-fpermissive]
   value = (uint32)get_net(); //  This seems to be functionless, but should be preserved for compatibility. It likewise appears functionless in Standard.
                           ^
obj/leitung2.cc: At global scope:
obj/leitung2.cc:687:42: warning: unused parameter ‘dummy’ [-Wunused-parameter]
 void pumpe_t::info(cbuffer_t & buf, bool dummy) const
                                          ^
obj/leitung2.cc:1130:42: warning: unused parameter ‘dummy’ [-Wunused-parameter]
 void senke_t::info(cbuffer_t & buf, bool dummy) const
                                          ^
common.mk:50: návod pro cíl „build/default/obj/leitung2.o“ selhal
make: *** [build/default/obj/leitung2.o] Chyba 1



Offline jamespetts

  • Simitrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 15131
  • Total likes: 353
  • Helpful: 154
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #281 on: March 27, 2017, 08:05:44 PM »
Ah - I think that I have pushed what amounts to a fix for this. Would you be able to re-test?
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Offline Vladki

Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #282 on: March 27, 2017, 10:23:13 PM »
fixed

Offline jamespetts

  • Simitrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 15131
  • Total likes: 353
  • Helpful: 154
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #283 on: April 12, 2017, 12:30:00 AM »
Testing again with the latest nightly build from the server, this still stays in sync with identical builds. I have not fully re-tested for synchronisation between different builds, but it is useful to test this periodically just to make sure that nothing has broken network synchronisation in new ways. The problem still seems to be confined to mixing Linux/Windows builds. (I have briefly re-tested with the Bridgewater-Brunel server, and the current nightly MinGW build still desynchronises with that virtually instantly).
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Offline jamespetts

  • Simitrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 15131
  • Total likes: 353
  • Helpful: 154
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #284 on: April 23, 2017, 10:21:00 PM »
I am developing a new method to try to test the cause of this and other desyncs: I have just pushed a change to the code in which, if the preprocessor directive DISABLE_RANDOMNESS is defined, the random number generator will always simply return half of the maximum value, and the function for getting the random number generator seed will always return 1. I am in the process of rebuilding the Bridgewater-Brunel server with this option enabled. This means that it will not stay in sync with any normal client, but should be able to stay in sync with a client compiled with DISABLE_RANDOMNESS.

When a client with DISABLE_RANDOMNESS connects with the server, it should be possible to see in the game itself where the desyncs arise without this affecting lots of other unconnected areas of the game at random by changing the random number generator seed. Also, because the random number generator seed is fixed, the client should not be kicked from the game unless and until a more major difference (such as in the number of convoys) emerges. This system is intended just for testing (it would be no good in a real game, as there would be no randomness), but it should help to highlight where the problems are.

It would be very helpful if anyone with the ability to compile the game were to connect to the Bridgewater-Brunel server with a build with DISABLE_RANDOMNESS compiled with two separate clients (both built with DISABLE_RANDOMNESS) to try to spot how, if at all, they diverge from one another. It might be hard to spot, as the game currently running on the server is a big map; if anyone can find a smaller, simpler map which will reliably desync with a normal build between Windows and Linux, it would be helpful to use that for testing, too.

Thank you all in advance for any help with this: it would be much appreciated.

Edit: Having now tested this briefly, the special DISABLE_RANDOMNESS build does appear to stay in sync, as expected. Any help in tracking down actual divergence would be very much appreciated.
 
« Last Edit: April 23, 2017, 10:39:39 PM by jamespetts »
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Offline prissi

  • Developer
  • Administrator
  • *
  • Posts: 8685
  • Total likes: 294
  • Helpful: 228
  • Languages: De,EN,JP
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #285 on: April 23, 2017, 11:36:32 PM »
The random number seed is not used in simutrans at all (unless experiemtnal added some code using it). The internal random generator uses the Mersenne Twister alogrithm and is seeded with a complex seed depending on the time in ms since 1970 and some mathematical operations.

If the differences are really compiler/machine dependent, then it may be quite well the rounding of a float/double to an integer, which can give different results on different compilers and machines.

Offline jamespetts

  • Simitrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 15131
  • Total likes: 353
  • Helpful: 154
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #286 on: April 23, 2017, 11:41:42 PM »
Is it not the random seed that is checked between server and client to test for a mismatch?

As to the floating point thing, this was a problem a long time ago, but all the floating point arithmetic other than in the GUI was removed circa 2011/2012 to solve this problem, and no further floating point arithmetic has been added since, so I do not think that it is this. Bernd Gabriel even wrote an elaborate class to simulate floating point arithmetic using integers to get around this.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Offline Vladki

Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #287 on: April 23, 2017, 11:56:21 PM »
I have restarted all instances at server.exp.simutrans com to commit 58181b37d85561c.... and DISABLE_RANDOMNESS.
However I had desync with both British pak games after 5-10 minutes. Pak sweden seems to be stable.
You are welcome to test them.


Offline jamespetts

  • Simitrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 15131
  • Total likes: 353
  • Helpful: 154
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #288 on: April 24, 2017, 12:51:34 AM »
I have restarted all instances at server.exp.simutrans com to commit 58181b37d85561c.... and DISABLE_RANDOMNESS.
However I had desync with both British pak games after 5-10 minutes. Pak sweden seems to be stable.
You are welcome to test them.

Thank you - can you see if you can capture and post the debug output from the desync that you get? That would be most helpful.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Offline Vladki

Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #289 on: April 24, 2017, 10:35:21 PM »
Playing bridgewater (copy) for maybe an hour - without desync (linux/linux)

Offline jamespetts

  • Simitrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 15131
  • Total likes: 353
  • Helpful: 154
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #290 on: April 24, 2017, 11:38:01 PM »
Thank you for checking that, although the linux/linux connexion was working before in any event. Can anyone running Windows connect to it, run it for a while, then after perhaps 5-10 minutes, connect a second client, and search for any differences in the map? That would be extremely helpful.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Offline Vladki

Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #291 on: April 25, 2017, 06:21:03 AM »
Just a quick note, if testing games on server.exp.simutrans.com, use the Pakset provided there. It has a few modifications.




Offline jamespetts

  • Simitrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 15131
  • Total likes: 353
  • Helpful: 154
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #292 on: April 25, 2017, 12:03:01 PM »
Interesting - may I ask what the modifications are?
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Offline Vladki

Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #293 on: April 25, 2017, 08:47:15 PM »
Removed sound from crossings (reported in another thread).




Offline jamespetts

  • Simitrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 15131
  • Total likes: 353
  • Helpful: 154
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #294 on: April 25, 2017, 09:51:05 PM »
Interesting - what effect have you found that that has had?
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Offline Vladki

Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #295 on: April 26, 2017, 06:13:44 AM »
It causes pakset mismatch when connecting to network game. http://forum.simutrans.com/index.php?topic=16996.0



« Last Edit: April 27, 2017, 09:52:55 PM by Vladki »

Offline jamespetts

  • Simitrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 15131
  • Total likes: 353
  • Helpful: 154
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #296 on: April 29, 2017, 10:51:38 PM »
I think that I have fixed a bug that might have caused this (albeit this requires recompiling makeobj). Would you be able to test? I should be most grateful.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Offline Vladki

Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #297 on: April 30, 2017, 10:14:49 AM »
Pakset mismatch fixed

Offline jamespetts

  • Simitrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 15131
  • Total likes: 353
  • Helpful: 154
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #298 on: April 30, 2017, 11:47:11 AM »
Splendid, thank you for confirming.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Offline jamespetts

  • Simitrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 15131
  • Total likes: 353
  • Helpful: 154
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #299 on: May 22, 2017, 12:40:32 AM »
I have been doing some further testing on this issue to-day.

As readers of this thread may remember, there are essentially two separate problems: (1) the original problem of a GCC build immediately desynchronising with a Visual Studio build; and (2) a desync occurring, irrespective of the build, when more than one client connects to a server (the earlier connected clients desyncing after a delay).

A week or two ago (I cannot remember exactly when), I fixed a bug relating to the post-loading code for vehicles that had the potential to cause the second issue. I had not had time to test whether this did fix this issue at the time, however.

Recently, I have been working on the code for passenger and mail classes. In testing some of that code, I found a thread deadlock in the path explorer code. This turns out to have been caused, not by a bug in the new passenger and mail classes code, but by a pre-existing bug in the multi-threading path explorer code. Looking very carefully into the documentation for pthreads, it transpired that I had misunderstood the relationship between the pthread_cond_wait command and the mutex that it requires as a parameter. The existing path explorer multi-threading code is classed as having undefined behaviour by the pthreads standard as it calls a mutex lock multiple times in succession.

I have coded an initial attempt at a fix to the multi-threaded path explorer on this dedicated branch.

However, testing shows that this code gives rise to a network desync when instances of the same build are connected on the loopback interface for testing, albeit only after some considerable time has lapsed. This does not occur when the path explorer multi-threading is disabled or on the master branch.

Meanwhile, testing on the master branch seems to show that the other problem (a desync occurring a short while after a second or subsequent client connects) seems to have been fixed, which I suspect is to do with the post-loading code fix to which I refer above.

I have now run out of time for further testing (as each individual test cycle requires running the whole thing for over an hour, fast forwarding not being possible in network mode) this week-end, but will look into refining the new code further. Because the current code has undefined behaviour, this is a prime suspect for inter-platform desyncs, so I am keen to fix this as soon as possible.

If anyone can spot any immediate problems in my new multi-threading code, I should be grateful for any feedback.

Edit: Some further testing seems to show that the multi-threaded passenger generation code seems to be responsible for a desync between a Visual Studio client and a GCC (Msys) client (both Windows): when this is disabled, the two will stay in sync for far longer than when this is enabled. I have not yet tested long enough to see whether it will stay in sync permanently, however.

Edit 2: With the (modified) multi-threaded path explorer multi-threading enabled but the passenger generation multi-threading disabled, it still desyncs between an Msys GCC client and the Visual Studio client, but only after a very long time; the same sort of time as it takes to desync between two Visual Studio clients with the new path explorer multi-threading algorithm. This suggests that the passenger generation multi-threading may well be responsible somehow for the desync between differently compiled versions of Extended.

Edit 3: I have now run a long-term test connecting a single Msys/GCC compiled client to a Visual Studio compiled server all day to-day with the passenger generation multi-threading disabled and the (existing) path explorer multi-threading enabled, and the two are still in sync even now. I have also had an answer on Stack Exchange that might help to explain the problem that I have been having with the new path explorer multi-threading code.

Edit 4: Using the 2010 edition of Rollermaterial's map, in order to use a more challenging test, with the latest code on the passenger-generation-multi-threading-fix (currently, just minor updates to the path explorer multi-threading from the master branch, and disabling the passenger generation multi-threading entirely), a Visual Studio client will stay in sync with another Visual Studio client for longer than I have so far measured, but an Msys/GCC client, connected second, will desync after approximately one game hour (with no interaction).

Edit 5: Using the mutex error checking, no mutex errors can be found running the britain-2010 map in the passenger generation multi-threading.

Edit 6: Repeating the test from edit 3 using the saved game from edit 4 produces a desync, but only after running for about one game month.

Edit 7: Repeating the test from edit 6 with the path explorer multi-threading disabled produces a desync after a long time, but slightly short of a game month (i.e. before the crossing of a month boundary since loading the game).

Edit 8: Very oddly indeed, when testing with FORBID_MULTI_THREAD_PASSENGER_GENERATION_IN_NETWORK_MODE defined, I get a near-instant desync between a GCC/Msys and Visual Studio client, although two Visual Studio clients will happily stay in sync. The difference between FORBID_MULTI_THREAD_PASSENGER_GENERATION_IN_NETWORK_MODE and FORBID_PARALLELL_PASSENGER_GENERATION is that the former uses the old code for single-threaded passenger generation in the main thread, whereas the latter (i.e., the one that works) uses a separate thread for the passenger generation, but only actually runs the passenger generation on a single thread rather than all of the passenger generation threads. This is very bizarre, as it suggests that there is an error with the single threaded passenger generation code that is not present in the multi-threaded passenger generation code when it is restricted to running with only one thread.

Edit 9: Even running entirely single-threadedly, the GCC/Msys build will desync from the Visual Studio build in seconds with the britain-2010 saved game. It seems that only when the passenger generation multi-threading is operational but it is set to use only one of the threads does this work (for a while) without desyncing. This is extremely odd.

Edit 10: Defining FIXED_PASSENGER_NUMBERS_PER_STEP_FOR_TESTING with the passenger generation multi-threading fully enabled does not prevent the Visual Studio/GCC desync.
« Last Edit: Today at 06:54:23 PM by jamespetts »
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.