Author Topic: Desync issue (devel-new-2) with Linux Server/Windows client  (Read 21747 times)

0 Members and 1 Guest are viewing this topic.

Offline Vladki

Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #280 on: March 27, 2017, 07:26:05 PM »
I'm afraid that this change (commit 7d09ea4d9d1f18a9f9b237f8a7167259eb8f4c6f) has broken compilation on linux:

Code: [Select]
===> CXX obj/leitung2.cc
g++ -std=gnu++11 -O -DNDEBUG -DMULTI_THREAD -DREVISION="7d09ea4" -Wall -W -Wcast-qual -Wpointer-arith -Wcast-align -DUSE_C -fno-delete-null-pointer-checks -fno-strict-aliasing  -I/usr/include/SDL -D_GNU_SOURCE=1 -D_REENTRANT -DCOLOUR_DEPTH=16 -c -MMD -o build/default/obj/leitung2.o obj/leitung2.cc
In file included from obj/../vehicle/../boden/grund.h:19:0,
                 from obj/../vehicle/../simplan.h:12,
                 from obj/../vehicle/../simworld.h:34,
                 from obj/../vehicle/simvehicle.h:18,
                 from obj/../vehicle/simroadtraffic.h:15,
                 from obj/../simcity.h:22,
                 from obj/leitung2.h:16,
                 from obj/leitung2.cc:17:
obj/../vehicle/../boden/wege/weg.h:60:41: warning: type qualifiers ignored on function return type [-Wignored-qualifiers]
  static const uint32 get_all_ways_count();
                                         ^
In file included from obj/leitung2.cc:23:0:
obj/../simfab.h:131:32: warning: type qualifiers ignored on function return type [-Wignored-qualifiers]
  const sint32 get_in_transit() const { return statistics[0][FAB_GOODS_TRANSIT]; }
                                ^
obj/leitung2.cc:395:44: warning: unused parameter ‘dummy’ [-Wunused-parameter]
 void leitung_t::info(cbuffer_t & buf, bool dummy) const
                                            ^
obj/leitung2.cc: In member function ‘virtual void leitung_t::rdwr(loadsave_t*)’:
obj/leitung2.cc:457:27: error: cast from ‘powernet_t*’ to ‘uint32 {aka unsigned int}’ loses precision [-fpermissive]
   value = (uint32)get_net(); //  This seems to be functionless, but should be preserved for compatibility. It likewise appears functionless in Standard.
                           ^
obj/leitung2.cc: At global scope:
obj/leitung2.cc:687:42: warning: unused parameter ‘dummy’ [-Wunused-parameter]
 void pumpe_t::info(cbuffer_t & buf, bool dummy) const
                                          ^
obj/leitung2.cc:1130:42: warning: unused parameter ‘dummy’ [-Wunused-parameter]
 void senke_t::info(cbuffer_t & buf, bool dummy) const
                                          ^
common.mk:50: návod pro cíl „build/default/obj/leitung2.o“ selhal
make: *** [build/default/obj/leitung2.o] Chyba 1



Offline jamespetts

  • Simitrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 15698
  • Total likes: 395
  • Helpful: 174
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #281 on: March 27, 2017, 08:05:44 PM »
Ah - I think that I have pushed what amounts to a fix for this. Would you be able to re-test?
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Offline Vladki

Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #282 on: March 27, 2017, 10:23:13 PM »
fixed

Offline jamespetts

  • Simitrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 15698
  • Total likes: 395
  • Helpful: 174
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #283 on: April 12, 2017, 12:30:00 AM »
Testing again with the latest nightly build from the server, this still stays in sync with identical builds. I have not fully re-tested for synchronisation between different builds, but it is useful to test this periodically just to make sure that nothing has broken network synchronisation in new ways. The problem still seems to be confined to mixing Linux/Windows builds. (I have briefly re-tested with the Bridgewater-Brunel server, and the current nightly MinGW build still desynchronises with that virtually instantly).
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Offline jamespetts

  • Simitrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 15698
  • Total likes: 395
  • Helpful: 174
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #284 on: April 23, 2017, 10:21:00 PM »
I am developing a new method to try to test the cause of this and other desyncs: I have just pushed a change to the code in which, if the preprocessor directive DISABLE_RANDOMNESS is defined, the random number generator will always simply return half of the maximum value, and the function for getting the random number generator seed will always return 1. I am in the process of rebuilding the Bridgewater-Brunel server with this option enabled. This means that it will not stay in sync with any normal client, but should be able to stay in sync with a client compiled with DISABLE_RANDOMNESS.

When a client with DISABLE_RANDOMNESS connects with the server, it should be possible to see in the game itself where the desyncs arise without this affecting lots of other unconnected areas of the game at random by changing the random number generator seed. Also, because the random number generator seed is fixed, the client should not be kicked from the game unless and until a more major difference (such as in the number of convoys) emerges. This system is intended just for testing (it would be no good in a real game, as there would be no randomness), but it should help to highlight where the problems are.

It would be very helpful if anyone with the ability to compile the game were to connect to the Bridgewater-Brunel server with a build with DISABLE_RANDOMNESS compiled with two separate clients (both built with DISABLE_RANDOMNESS) to try to spot how, if at all, they diverge from one another. It might be hard to spot, as the game currently running on the server is a big map; if anyone can find a smaller, simpler map which will reliably desync with a normal build between Windows and Linux, it would be helpful to use that for testing, too.

Thank you all in advance for any help with this: it would be much appreciated.

Edit: Having now tested this briefly, the special DISABLE_RANDOMNESS build does appear to stay in sync, as expected. Any help in tracking down actual divergence would be very much appreciated.
 
« Last Edit: April 23, 2017, 10:39:39 PM by jamespetts »
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Offline prissi

  • Developer
  • Administrator
  • *
  • Posts: 8799
  • Total likes: 320
  • Helpful: 229
  • Languages: De,EN,JP
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #285 on: April 23, 2017, 11:36:32 PM »
The random number seed is not used in simutrans at all (unless experiemtnal added some code using it). The internal random generator uses the Mersenne Twister alogrithm and is seeded with a complex seed depending on the time in ms since 1970 and some mathematical operations.

If the differences are really compiler/machine dependent, then it may be quite well the rounding of a float/double to an integer, which can give different results on different compilers and machines.

Offline jamespetts

  • Simitrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 15698
  • Total likes: 395
  • Helpful: 174
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #286 on: April 23, 2017, 11:41:42 PM »
Is it not the random seed that is checked between server and client to test for a mismatch?

As to the floating point thing, this was a problem a long time ago, but all the floating point arithmetic other than in the GUI was removed circa 2011/2012 to solve this problem, and no further floating point arithmetic has been added since, so I do not think that it is this. Bernd Gabriel even wrote an elaborate class to simulate floating point arithmetic using integers to get around this.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Offline Vladki

Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #287 on: April 23, 2017, 11:56:21 PM »
I have restarted all instances at server.exp.simutrans com to commit 58181b37d85561c.... and DISABLE_RANDOMNESS.
However I had desync with both British pak games after 5-10 minutes. Pak sweden seems to be stable.
You are welcome to test them.


Offline jamespetts

  • Simitrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 15698
  • Total likes: 395
  • Helpful: 174
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #288 on: April 24, 2017, 12:51:34 AM »
I have restarted all instances at server.exp.simutrans com to commit 58181b37d85561c.... and DISABLE_RANDOMNESS.
However I had desync with both British pak games after 5-10 minutes. Pak sweden seems to be stable.
You are welcome to test them.

Thank you - can you see if you can capture and post the debug output from the desync that you get? That would be most helpful.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Offline Vladki

Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #289 on: April 24, 2017, 10:35:21 PM »
Playing bridgewater (copy) for maybe an hour - without desync (linux/linux)

Offline jamespetts

  • Simitrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 15698
  • Total likes: 395
  • Helpful: 174
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #290 on: April 24, 2017, 11:38:01 PM »
Thank you for checking that, although the linux/linux connexion was working before in any event. Can anyone running Windows connect to it, run it for a while, then after perhaps 5-10 minutes, connect a second client, and search for any differences in the map? That would be extremely helpful.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Offline Vladki

Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #291 on: April 25, 2017, 06:21:03 AM »
Just a quick note, if testing games on server.exp.simutrans.com, use the Pakset provided there. It has a few modifications.




Offline jamespetts

  • Simitrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 15698
  • Total likes: 395
  • Helpful: 174
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #292 on: April 25, 2017, 12:03:01 PM »
Interesting - may I ask what the modifications are?
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Offline Vladki

Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #293 on: April 25, 2017, 08:47:15 PM »
Removed sound from crossings (reported in another thread).




Offline jamespetts

  • Simitrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 15698
  • Total likes: 395
  • Helpful: 174
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #294 on: April 25, 2017, 09:51:05 PM »
Interesting - what effect have you found that that has had?
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Offline Vladki

Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #295 on: April 26, 2017, 06:13:44 AM »
It causes pakset mismatch when connecting to network game. http://forum.simutrans.com/index.php?topic=16996.0



« Last Edit: April 27, 2017, 09:52:55 PM by Vladki »

Offline jamespetts

  • Simitrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 15698
  • Total likes: 395
  • Helpful: 174
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #296 on: April 29, 2017, 10:51:38 PM »
I think that I have fixed a bug that might have caused this (albeit this requires recompiling makeobj). Would you be able to test? I should be most grateful.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Offline Vladki

Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #297 on: April 30, 2017, 10:14:49 AM »
Pakset mismatch fixed

Offline jamespetts

  • Simitrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 15698
  • Total likes: 395
  • Helpful: 174
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #298 on: April 30, 2017, 11:47:11 AM »
Splendid, thank you for confirming.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Offline jamespetts

  • Simitrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 15698
  • Total likes: 395
  • Helpful: 174
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #299 on: May 22, 2017, 12:40:32 AM »
I have been doing some further testing on this issue to-day.

As readers of this thread may remember, there are essentially two separate problems: (1) the original problem of a GCC build immediately desynchronising with a Visual Studio build; and (2) a desync occurring, irrespective of the build, when more than one client connects to a server (the earlier connected clients desyncing after a delay).

A week or two ago (I cannot remember exactly when), I fixed a bug relating to the post-loading code for vehicles that had the potential to cause the second issue. I had not had time to test whether this did fix this issue at the time, however.

Recently, I have been working on the code for passenger and mail classes. In testing some of that code, I found a thread deadlock in the path explorer code. This turns out to have been caused, not by a bug in the new passenger and mail classes code, but by a pre-existing bug in the multi-threading path explorer code. Looking very carefully into the documentation for pthreads, it transpired that I had misunderstood the relationship between the pthread_cond_wait command and the mutex that it requires as a parameter. The existing path explorer multi-threading code is classed as having undefined behaviour by the pthreads standard as it calls a mutex lock multiple times in succession.

I have coded an initial attempt at a fix to the multi-threaded path explorer on this dedicated branch.

However, testing shows that this code gives rise to a network desync when instances of the same build are connected on the loopback interface for testing, albeit only after some considerable time has lapsed. This does not occur when the path explorer multi-threading is disabled or on the master branch.

Meanwhile, testing on the master branch seems to show that the other problem (a desync occurring a short while after a second or subsequent client connects) seems to have been fixed, which I suspect is to do with the post-loading code fix to which I refer above.

I have now run out of time for further testing (as each individual test cycle requires running the whole thing for over an hour, fast forwarding not being possible in network mode) this week-end, but will look into refining the new code further. Because the current code has undefined behaviour, this is a prime suspect for inter-platform desyncs, so I am keen to fix this as soon as possible.

If anyone can spot any immediate problems in my new multi-threading code, I should be grateful for any feedback.

Edit: Some further testing seems to show that the multi-threaded passenger generation code seems to be responsible for a desync between a Visual Studio client and a GCC (Msys) client (both Windows): when this is disabled, the two will stay in sync for far longer than when this is enabled. I have not yet tested long enough to see whether it will stay in sync permanently, however.

Edit 2: With the (modified) multi-threaded path explorer multi-threading enabled but the passenger generation multi-threading disabled, it still desyncs between an Msys GCC client and the Visual Studio client, but only after a very long time; the same sort of time as it takes to desync between two Visual Studio clients with the new path explorer multi-threading algorithm. This suggests that the passenger generation multi-threading may well be responsible somehow for the desync between differently compiled versions of Extended.

Edit 3: I have now run a long-term test connecting a single Msys/GCC compiled client to a Visual Studio compiled server all day to-day with the passenger generation multi-threading disabled and the (existing) path explorer multi-threading enabled, and the two are still in sync even now. I have also had an answer on Stack Exchange that might help to explain the problem that I have been having with the new path explorer multi-threading code.

Edit 4: Using the 2010 edition of Rollmaterial's map, in order to use a more challenging test, with the latest code on the passenger-generation-multi-threading-fix (currently, just minor updates to the path explorer multi-threading from the master branch, and disabling the passenger generation multi-threading entirely), a Visual Studio client will stay in sync with another Visual Studio client for longer than I have so far measured, but an Msys/GCC client, connected second, will desync after approximately one game hour (with no interaction).

Edit 5: Using the mutex error checking, no mutex errors can be found running the britain-2010 map in the passenger generation multi-threading.

Edit 6: Repeating the test from edit 3 using the saved game from edit 4 produces a desync, but only after running for about one game month.

Edit 7: Repeating the test from edit 6 with the path explorer multi-threading disabled produces a desync after a long time, but slightly short of a game month (i.e. before the crossing of a month boundary since loading the game).

Edit 8: Very oddly indeed, when testing with FORBID_MULTI_THREAD_PASSENGER_GENERATION_IN_NETWORK_MODE defined, I get a near-instant desync between a GCC/Msys and Visual Studio client, although two Visual Studio clients will happily stay in sync. The difference between FORBID_MULTI_THREAD_PASSENGER_GENERATION_IN_NETWORK_MODE and FORBID_PARALLELL_PASSENGER_GENERATION is that the former uses the old code for single-threaded passenger generation in the main thread, whereas the latter (i.e., the one that works) uses a separate thread for the passenger generation, but only actually runs the passenger generation on a single thread rather than all of the passenger generation threads. This is very bizarre, as it suggests that there is an error with the single threaded passenger generation code that is not present in the multi-threaded passenger generation code when it is restricted to running with only one thread.

Edit 9: Even running entirely single-threadedly, the GCC/Msys build will desync from the Visual Studio build in seconds with the britain-2010 saved game. It seems that only when the passenger generation multi-threading is operational but it is set to use only one of the threads does this work (for a while) without desyncing. This is extremely odd.

Edit 10: Defining FIXED_PASSENGER_NUMBERS_PER_STEP_FOR_TESTING with the passenger generation multi-threading fully enabled does not prevent the Visual Studio/GCC desync.

Edit 11: Defining DISABLE_JOB_EFFECTS prevents the very quick desync between the Visual Studio and GCC/Msys builds with the britain-2010 saved game.

Edit 12: I have reverted the DISABLE_RANDOMNESS setting on the server's version and recompiled it so that people can again try to connect with an unmodified client for testing purposes.

Edit 13: Attempting to connect to the Bridgewater-Brunel server from a the cross-compiled client (both from the master branch) still results in a near instant desync.

Edit 14: Further testing shows that the cause of the short desync between a Visual Studio client and a GCC/Msys (Windows) client appears to have been using the min() and max() methods with 64-bit integers when in fact they are defined as using signed 32-bit integers.  I have added a special 64-bit version of min() and max() (called min_64() and max_64()) to handle these where they appeared in the code relating to job effects, and I can now connect, with job effects and passenger generation multi-threading both enabled, an Msys/GCC client to a Visual Studio server for a considerable time (crossing a month boundary in the britain-2010 game) before a desync occurs. The long desync, however, after the month boundary is crossed, is still present.

This opens up a new line of enquiry into all cross platform desyncs, however, as there may be other sync critical places in the code with 64-bit integers using these methods. Also, I wonder whether it is safe for unsigned 32-bit integers to use these methods.

Edit 15: I cannot find any more instances of the min()/max() methods being passed 64-bit integers, although I have slightly improved some code. This has not prevented the long desync between Visual Studio and Msys/GCC clients, but normally connexions will be made between GCC/Msys and GCC/Linux clients in any event, so it is possible that this desync is not important.

I have now integrated the above fix into the master branch as this is clearly an improvement on the previous code and fixes a specific issue. There will be further testing on the Bridgewater-Brunel server.

Edit 16: There is still an immediate desync when connecting to the Bridgewater-Brunel server with the client cross-compiled on the Bridgewater-Brunel server It is not clear at present what the cause of this is.

Edit 17: The same result obtains with both the Msys/GCC and Visual Studio builds connecting to the Bridgewater-Brunel server: an instant desync.

Edit 18: Testing with my Linux computer, this seems to desync from the Bridgewater-Brunel server instantly, too, but be able to stay in sync with a Windows server. However, the problem of one client joining causing all other clients to desync shortly after connecting appears to have returned, and it is not clear why.

Edit 19: The client kick desync can be reproduced with the Visual Studio and the GCC/Msys builds connecting to a build of the same type.

Edit 20: Testing again with all multi-threading disabled, the client kick desync cannot be reproduced. This appears to be the same issue as was investigated some months ago relating apparently to multi-threading of the load/save routines. The long desync appears also not to be reproducible in this contingent, but this needs a longer period of testing to confirm.

Edit 21: With multi-threading disabled entirely, three clients can remain connected to a local server for many hours and many in-game months without desyncing from the server.
« Last Edit: May 27, 2017, 10:42:43 PM by jamespetts »
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Offline jamespetts

  • Simitrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 15698
  • Total likes: 395
  • Helpful: 174
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #300 on: June 11, 2017, 07:54:58 PM »
Re-testing this again now, this can still be reproduced on the current master branch.

Analysing this carefully, this must be a problem with saving rather than loading. This is because the online gaming works in the following way: the server starts out with no clients connected. When the first client connects, it saves the game, sends the saved game as a file to the client requesting connexion, then loads that saved game, along with the client. When another client connects, the same procedure is applied for the newly joining client, but, in order to save bandwidth, the existingly connected clients save their games locally and re-load their own saved games without needing this to be transmitted from the server.

Thus, for all of the existingly connected clients to desync (at exactly the same time, as I have just confirmed) and for the most recently connected client not to desync, the problem must be that the files that the already connected clients are loading are not identical to the one being loaded by the newly connected client and the server. Because the newly connected client loads but does not save, any non-determinism in the saving mechanism is not relevant, as both server and the most recently connected client will in any event be loading exactly the same file. Only the existingly connected clients have to load a file that has been saved other than by the server, and thus potentially get something non-identical when they save.

The problem cannot be (or, at least, is very unlikely to be) in the code for loading (including the post-processing after loading), nor an inadequacy in what data are saved, since a problem with either of things would equally affect the newly connected client, whereas tests show that the most recently connected client always reliably stays in sync.

Thus, I need to look for problems specifically in the code for saving that is not shared with the code for loading. Much of the save/load code is shared (the computer being instruct to save or load a particular datum in one line of code by the "rdwr" method, and whehter it saves or loads is determined by whether the game is currently in the process of loading or saving generally), but there are some places where the code differs between saving and loading, being in simworld.cc and at various places that use branching logic depending on whether the file is loading or saving.

Edit 1: Forcing the use of bzip 2 rather than zip has no effect on the desync issue.

Edit 2: The problem does not appear to be any error with loading/saving the "parallell_operations" datum.

Edit 3: It appears that this cannot be reproduced on a map where the only transport infrastructure consists of airports and roads usable by private cars (I have generally used one of Rollmaterial's large, complex maps to test this, which has air, water, rail and road transport).

Edit 4: If I add a fairly substantial 'bus network to the airport only map, I can reproduce the desync problem relatively quickly. This eliminates the possibility that the problem is caused by industries or electricity supplies, or that it is specific to rail transport.

Edit 5: Running overnight with the aircraft only, three instances of the client remain connected to the server (over the loopback interface). However, none of the airports were in range of any town buildings, so no passengers were transported. From this, I infer that the problem is likely to be connected to the actual transporting of passengers by player provided transport (as opposed to walking or the use of private cars).

Edit 6: Removing the aircraft but retaining the 'buses, the client kick desync can still be reproduced. I have noticed, however, that the desync only occurs if the later clients join a little while after the first client joined.

Edit 7: Further testing seems to have ruled out the loading/saving of transferring passengers at stops. Also, I have found that what counts as far as timing is concerned appears to be the time between the last load/save cycle and the new client joining, as a series of rapid load/save cycles one after another where a number of clients join in quick succession allow a large number of clients to connect simultaneously, but if, after a period of time after the last load/save cycle, another client tries to connect, all of the previously connected clients will desync very quickly.

Edit 8: The problem appears to be in the world (rather than the stop) transferring cargo/passengers list: when I comment out line no. 9087 in simworld.cc, being

Code: [Select]
transferring_cargoes[0].append(tc);

I cannot reproduce the desync, and was able to connect 8 client instances to a server over the loopback interface without any of them desyncing. Obviously, this is not a solution, as it breaks the transferring cargo functionality, but it means that I have at least narrowed down the problem to this area of the code.

Edit 9: Editing the following part of the code so as not to compile the parts dependant on "MULTI_THREAD" being defined (by modifying that string in both cases to something else) prevents the desync from occurring:

Code: [Select]
if (file->get_extended_version() >= 13 || file->get_extended_revision() >= 15)
    {
        uint32 count;
        sint64 ready;
        ware_t ware;
#ifdef MULTI_THREAD
        count = 0;
        for (sint32 i = 0; i < parallel_operations; i++)
        {
            count += transferring_cargoes[i].get_count();
        }
#else
        count = transferring_cargoes[0].get_count();
#endif

        file->rdwr_long(count);

        sint32 po;
#ifdef MULTI_THREAD
        po = parallel_operations;
#else
        po = 1;
#endif

        for (sint32 i = 0; i < po; i++)
        {
            for (uint32 j = 0; j < transferring_cargoes[i].get_count(); j++)
            {
                ready = transferring_cargoes[i][j].ready_time;
                ware = transferring_cargoes[i][j].ware;

                file->rdwr_longlong(ready);
                ware.rdwr(file);
            }
        }
    }

This modification has the effect of saving only one of the total of 4 sets of transferring cargoes/passengers generated by the multi-threaded passenger generation system (an arbitrary cross-section of all transferring cargoes which cross-section should be the same between server and clients).

Given that I have verified that the number of parallel operations is consistent between server and client (by default 5), it is not clear to me what is occurring here or why this should make a difference.

Edit 10: I think that I have - eventually - managed to fix this (which fix has just been pushed). The problem was that the parallel_operations variable was not always of the same value as was returned by the get_parallel_operations() method: in the case of the server, the parallel_operations value was 0, whereas get_parallel_operations() would return 5, but on the client, both would be 5. The consequence of this was that the server would in effect discard transferring cargo when saving whereas the clients would save correctly, thus causing a desync when there was any transferring cargo and a subsequent client joined.

I have fixed this by replacing parallel_operations in the above code with get_parallel_operations(), and I was able to connect three clients on the loopback interface to a local server running Rollmaterial's game in 2010 and I was able to confirm that they ran overnight without desyncing.
« Last Edit: June 14, 2017, 10:39:14 AM by jamespetts »
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Offline Vladki

Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #301 on: June 16, 2017, 06:35:11 PM »
I have updated my servers, and was connected to the big map for quite a while. But just a minute ago I got a desync. So I connected to the smaller sandbox map and it desynced almost immediately - perhaps due to someone else connecting to the game

Offline jamespetts

  • Simitrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 15698
  • Total likes: 395
  • Helpful: 174
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #302 on: June 16, 2017, 08:27:55 PM »
It would be helpful if you could be more specific about the circumstances in which this occurs.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Offline Ves

Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #303 on: June 17, 2017, 09:20:28 PM »
I tried to connect to both the bridgewater-brunel server and the two british games on server.exp.simutrans.com, but got immediate desyncs within a second on all accounts. The swedish servergame, however, stays online without desyncing for eternity it seems.


Offline jamespetts

  • Simitrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 15698
  • Total likes: 395
  • Helpful: 174
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #304 on: June 17, 2017, 09:23:51 PM »
Can I check which version that you are using (in terms of the abbreviated Github hash) and whether you have downloaded the latest pakset from the server?
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Offline Ves

Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #305 on: June 17, 2017, 09:27:32 PM »
I used this executable:
e18d813edb17e52282d6c94863b57ea81aa0576b (16.06.2017)

and updated to this pakset:
122fa645ae91ecbc5eb9a55b5c939064bed21cab (16.06.2017)

I compiled the pakset using the latest makeobj with the same commit as the executable.

Offline jamespetts

  • Simitrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 15698
  • Total likes: 395
  • Helpful: 174
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #306 on: June 17, 2017, 09:31:52 PM »
Did you download the executable and pakset from the server or compile them yourself? If the latter, I should be grateful if you could re-test with downloaded executable and pakset.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Offline Ves

Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #307 on: June 17, 2017, 09:35:25 PM »
I compiled everything myself.
I will try downloading everything and see if that changes.

Offline jamespetts

  • Simitrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 15698
  • Total likes: 395
  • Helpful: 174
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #308 on: June 17, 2017, 09:37:48 PM »
Splendid, thank you.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Offline Ves

Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #309 on: June 17, 2017, 10:04:28 PM »
aarghh, my computer wont let me open the downloaded nightly build because it needs to be sent to the AVG-center first and be confirmated or something similar which would take around 2-3 hours!  :o
When pressing the "I trust this program" button, I just get a small window telling me that I dont have permission to the location where that file is located. I hate it when I dont have controll over my own computer...

Anyway, I could test with the downloaded pakset and that stayed sync with the small british map on server.exp.simutrans.com for around 30 seconds (didnt try the other) and is connected with bridgewater-brunel while Im writing without desyncing at all, now for at least 5 minutes.
Connecting with my own compiled pakset to bridgewater-brunel server at the same time (so I had two instances of the game connected to the same map) generated a desync within 30 seconds for that pakset, leaving the nightly downloaded pakset online!

So, something has to be said for the different paksets. I believe they are the same paksets, it should only be the makeobj's that are different for them. I have uploaded the makeobj I use to http://server.exp.simutrans.com/Devel-new-builds/

Offline jamespetts

  • Simitrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 15698
  • Total likes: 395
  • Helpful: 174
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #310 on: June 17, 2017, 11:26:31 PM »
I am not going to be able to work out the differences in the pakset either from a compiled makeobj or a compiled pakset - this problem looks as though it is more easily dealt with simply by using the pakset downloaded from the server rather than trying to work out where the divergence lies, I think.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Offline Vladki

Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #311 on: June 18, 2017, 10:12:44 AM »
The only difference between my and nightly pakset is the rolling resistance of hackney carriage...

I had an immediate desync after connecting to "british sandbox" game, but on the second try it was OK.

Offline jamespetts

  • Simitrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 15698
  • Total likes: 395
  • Helpful: 174
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #312 on: June 18, 2017, 11:03:12 AM »
I should note that the British sandbox game is not updated to the latest nightly version. I have a shell script on the Bridgewater-Brunel server that updates it to the latest nightly version every night (which had not been working properly, but which I have just now fixed), which should ensure that the same version is running on the server as is available to download.

It is always better to test with the latest version to make sure that no errors that have since been fixed are causing the trouble.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Offline Vladki

Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #313 on: June 18, 2017, 11:15:29 AM »
Could you share that script? I update the sandbox game manually, and that takes quite a lot of time...

Offline jamespetts

  • Simitrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 15698
  • Total likes: 395
  • Helpful: 174
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Desync issue (devel-new-2) with Linux Server/Windows client
« Reply #314 on: June 18, 2017, 11:24:32 AM »
The scripts come in a number of parts, all in ~/. The first is nightly.sh, which builds the executables and paksets from source, copies them to the download directories and then copies them also (or links them in the case of the paksets) to the directory from where the game is actually run on the server:

Code: [Select]
echo "***"
echo "Nightly build for Linux"
echo "***"
cd /usr/share/games/nightly/simutrans-experimental
echo "Fetching new version of the code"
echo "***"
git pull origin master --no-edit
echo "***"
echo "Building the main executable"
echo "***"
# Linux
env CFG=default make clean
env CFG=default make -j3
strip build/default/simutrans-extended
chmod +x build/default/simutrans-extended
env CFG=server make clean
env CFG=server make -j3
strip build/server/simutrans-extended
chmod +x build/server/simutrans-extended
# Windows
env CFG=mingw make clean
env CFG=mingw make -j3
echo "***"
echo "Building makeobj"
echo "***"
cd makeobj
# Linux
env CFG=default make clean
env CFG=default make -j3
strip /usr/share/games/nightly/simutrans-experimental/build/default/makeobj-extended/makeobj-exntended
chmod +x /usr/share/games/nightly/simutrans-experimental/build/default/makeobj-extended/makeobj-extended
# Windows
env CFG=mingw make clean
env CFG=mingw make -j3
echo "***"
echo "Linking Makeobj to the pakset directories"
echo "***"
ln -s /usr/share/games/nightly/simutrans-experimental/build/default/makeobj-extended/makeobj-extended /usr/share/games/nightly/simutrans-pak128.britain/makeobj-extended
ln -s /usr/share/games/nightly/simutrans-experimental/build/default/makeobj-extended/makeobj-extended /usr/share/games/nightly/Pak128.Sweden-Ex/makeobj
echo "***"
echo "Building nettool"
echo "***"
cd /usr/share/games/nightly/simutrans-experimental/nettools
# Linux
make clean
env CFG=default make -j3
strip /usr/share/games/nightly/simutrans-experimental/build/default/nettool/nettool
chmod +x /usr/share/games/nightly/simutrans-experimental/build/default/nettool/nettool
# Windows
env CFG=mingw make clean
env CFG=mingw make -j3
# Paksets
echo "***"
echo "Fetching the new version of the paksets"
echo "***"
cd /usr/share/games/nightly/simutrans-pak128.britain
git pull origin master --no-edit
cd /usr/share/games/nightly/Pak128.Sweden-Ex
git pull origin half-height --no-edit
echo "***"
echo "Building the paksets"
echo "***"
cd /usr/share/games/nightly/simutrans-pak128.britain
make clean; make -j3
cd /usr/share/games/nightly/Pak128.Sweden-Ex
make clean; make -j3
echo "***"
echo "Copying the files for download and the game server"
echo "***"
cp /usr/share/games/nightly/simutrans-experimental/build/default/simutrans-extended /var/www/downloads/nightly/linux-x64
cp /usr/share/games/nightly/simutrans-experimental/build/mingw/simutrans-extended /var/www/downloads/nightly/windows/Simutrans-Extended.exe

cp /usr/share/games/nightly/simutrans-experimental/build/default/makeobj-extended/makeobj-extended /var/www/downloads/nightly/linux-x64
cp /usr/share/games/nightly/simutrans-experimental/build/mingw/makeobj-extended/makeobj-extended /var/www/downloads/nightly/windows/Makeobj-Extended.exe

cp /usr/share/games/nightly/simutrans-experimental/build/default/nettool/nettool /var/www/downloads/nightly/linux-x64
cp /usr/share/games/nightly/simutrans-experimental/build/mingw/nettool/nettool /var/www/downloads/nightly/windows/Nettool-Extended.exe

rm /usr/share/games/simutrans-extended/nettool
cp /usr/share/games/nightly/simutrans-experimental/build/default/nettool/nettool /usr/share/games/simutrans-extended/nettool

rm /usr/share/games/simutrans-extended/simutrans-extended
cp /usr/share/games/nightly/simutrans-experimental/build/server/simutrans-extended /usr/share/games/simutrans-extended/simutrans-extended
cp /usr/share/games/nightly/simutrans-experimental/build/server/simutrans-extended /var/www/downloads/nightly/linux-x64/command-line-server-build

tar -zcvf /var/www/downloads/nightly/pakset/pak128.britain-ex-nightly.tar.gz --directory "/usr/share/games/nightly/simutrans-pak128.britain/pak128.Britain-Ex" .
tar -zcvf /var/www/downloads/nightly/pakset/pak128.sweden-ex-nightly.tar.gz --directory "/usr/share/games/nightly/Pak128.Sweden-Ex/pak128.Sweden-Ex" .
echo "***"
echo "Cleaning up the pakset folders"
rm /usr/share/games/nightly/Pak128.Sweden-Ex/makeobj
rm /usr/share/games/nightly/simutrans-pak128.britain/makeobj-extended
echo "***"
echo "Copying files to the /simutrans folder to make the Windows complete .zip file"
echo "***"
rm /usr/share/games/nightly/simutrans-experimental/simutrans/Simutrans-Extended.exe
cp /usr/share/games/nightly/simutrans-experimental/build/mingw/simutrans-extended /usr/share/games/nightly/simutrans-experimental/simutrans/Simutrans-Extended.exe
rm /usr/share/games/nightly/simutrans-experimental/simutrans/Makeobj-Extended.exe
cp /usr/share/games/nightly/simutrans-experimental/build/mingw/makeobj-extended/makeobj-extended /usr/share/games/nightly/simutrans-experimental/simutrans/Makeobj-Extended.exe
rm -rf /usr/share/games/nightly/simutrans-experimental/simutrans/Pak128.Britain-Ex
cp -R /usr/share/games/nightly/simutrans-pak128.britain/pak128.Britain-Ex /usr/share/games/nightly/simutrans-experimental/simutrans/Pak128.Britain-Ex
rm -rf /usr/share/games/nightly/simutrans-experimental/simutrans/Pak128.Sweden-Ex
cp -R /usr/share/games/nightly/Pak128.Sweden-Ex/pak128.Sweden-Ex /usr/share/games/nightly/simutrans-experimental/simutrans/Pak128.Sweden-Ex
echo "***"
echo "Zipping the Windows Simutrans-Extended-Complete file"
echo "***"
rm /var/www/downloads/nightly/packages/Simutrans-Extended-Complete.zip
cd /usr/share/games/nightly/simutrans-experimental/
zip -r /var/www/downloads/nightly/packages/Simutrans-Extended-Complete.zip ./simutrans
echo "***"
echo "Completed"

Next is warn-save.sh, a script which terminates the Simutrans-Extended process on the server after making sure that the game is saved and that players are warned of the impending reset:

Code: [Select]

# Shell script to run a force-sync on the running Simutrans-Experimental server
# but only after warning players that it is about to do this and waiting 2 minutes.
# Written by James E. Petts, February 2017

echo "Saving/restarting the server"
echo
date
echo
echo "Warning players of impending save/restart"

/usr/share/games/simutrans-extended/nettool -s bridgewater-brunel.me.uk -p Billinton1890 say "WARNING: Server about to be reset and updated to the latest version. All changes after the next save will be lost. Will save in 1 minute from now. This is an automated message."
sleep 1m

echo "Running a force-sync..."

/usr/share/games/simutrans-extended/nettool -s bridgewater-brunel.me.uk -p Billinton1890 clients

echo

/usr/share/games/simutrans-extended/nettool -s bridgewater-brunel.me.uk -p Billinton1890 -q force-sync

sleep 1m
echo "Stopping the server"
echo
/usr/share/games/simutrans-extended/nettool -s bridgewater-brunel.me.uk -p Billinton1890 say "WARNING: Server will shut down for restart and update to the latest version in 1 minute. No further progress will be saved. The server will restart within 2-5 minutes. You may need to download a new version to continue to connect. Download from http://bridgewater-brunel.me.uk/download/nightly. This is an automated message."
sleep 1m
/root/simctrl brit stop
# No need to restart manually here, as there is a cron job running simctrl brit restart every minute.

That relies on force-sync.sh, which does the actual saving:

Code: [Select]
# Shell script to run a force-sync on the running Simutrans-Experimental server
# Written by James E. Petts, December 2012

echo "Running a force-sync..."

/usr/share/games/simutrans-extended/nettool -s bridgewater-brunel.me.uk -p Billinton1890 clients

echo

/usr/share/games/simutrans-extended/nettool -s bridgewater-brunel.me.uk -p Billinton1890 -q force-sync

Then, I run a number of cron jobs to make sure that they run every night. Here is the output of crontab -e on my server:

Code: [Select]
# Edit this file to introduce tasks to be run by cron.
#
# Each task to run has to be defined through a single line
# indicating with different fields when the task will be run
# and what command to run for the task
#
# To define the time you can provide concrete values for
# minute (m), hour (h), day of month (dom), month (mon),
# and day of week (dow) or use '*' in these fields (for 'any').#
# Notice that tasks will be started based on the cron's system
# daemon's notion of time and timezones.
#
# Output of the crontab jobs (including errors) is sent through
# email to the user the crontab file belongs to (unless redirected).
#
# For example, you can run a backup of all your user accounts
# at 5 a.m every week with:
# 0 5 * * 1 tar -zcf /var/backups/home.tgz /home/
#
# For more information see the manual pages of crontab(5) and cron(8)
#
# m h  dom mon dow   command
*/1 * * * * /root/simctrl brit check >> /var/log/simutrans/check.log
00 */1 * * * /root/rotate-backup.sh >> /var/log/simutrans/rotate-backup.log
00 03 * * * /usr/sbin/logrotate /etc/logrotate.conf > /dev/null 2>&1
00 05 * * * bash -x /root/nightly.sh >> /var/log/simutrans/nightly-linux.log 2>&1
00 06 * * * /root/warn-save.sh >> /var/log/simutrans/warn-save.log
30 05 * * * /root/package.sh >> /var/log/simutrans/nightly-package.log

This checks every minute whether the game is running and restarts it if it is not. Note that this uses the simctrl script, written by Timothy a long time ago - do you have that set up?
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.