News:

SimuTranslator
Make Simutrans speak your language.

Server performance issues when clients connect

Started by jamespetts, December 05, 2017, 11:18:17 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

jamespetts

As some who have played recently on the Bridgewater-Brunel server may have noted, there is a very long delay when clients connect to the server. I have had some trouble tracking down the cause of this, as the behaviour has been inconsistent, so I am posting what is currently known here in case anyone is able to help to track down, or at least narrow down, where the problem might be.

What is known so far is that, on the Bridgewater-Brunel server, which is a 64-bit Linux build running in the command line only configuration on a Linux VPS with 4 cores, the phase in the client joining process where the server unloads the existing game and re-loads the game that it has just saved takes a very long time. Using exactly the same map/saved game, a Windows VPS with a 32-bit graphical client and 2 cores is able to perform the same task much more quickly. Similarly, both debug and release builds on my local Windows machine will complete the client joining phase much more quickly than the Bridgewater-Brunel server.

This problem does not appear to be consistent, on current information, with general poor performance of the Bridgewater-Brunel server. The other phases of the client joining process (the saving phase and the transferring phase) happen at more similar speeds to those observed on the Windows server. If this were a general performance problem, I should expect the saving time in particular to be increased proportionately to the loading time. Also, the Bridgewater-Brunel server has demonstrated itself to be capable of running very large and densely developed games, which it would not do reliably (clients would desync as they would get ahead of the server) if its performance were very poor. Likewise, if there were some severe problem with I/O performance, I should expect the transferring speed to be very slow. (For reference, a map of ~18Mb transfers to my computer as part of the client joining process in circa 5 seconds).

I attempted a general optimisation of loading times a few days ago, which seems to have helped loading times generally a little, but has made little noticeable difference to the client joining time on the Bridgewater-Brunel server.

One thing that I notice is that, when I use the control script to stop the server from running, it takes several minutes for the instance of Simutrans-Extended on the server to stop running. I wonder, therefore, whether the thing that is taking the long time on the server is not the loading of the saved game part of the client joining, but the unloading of the previous game. This involves calling the destructors of all the objects in the game. However, this phase is quite quick on the Windows machines, and profiling it on my own computer, there is no indication of anything unreasonably hindering performance or anything that could sensibly be optimised. The amount of time that this takes on the Windows machines is a tiny fraction of the amount of time that this would be taking on the Bridgewater-Brunel server if this were the issue there.

I am currently waiting for Vladki to set up a test server on a different Linux VPS running the Bridgewater-Brunel saved game so that I can test whether the problem is specific to the Bridgewater-Brunel server itself or general to the 64-bit Linux command line client.

In the meantime, if anyone else is able to contribute usefully to tracking down this problem, I should be very grateful for input.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

jamespetts

I have now had a chance to test with the server helpfully set up by Vladki at list.extended.simutrans.org, which I understand to be a Linux server running the command line client, as run by the Bridgewater-Brunel server.

This server appears to demonstrate the same lengthy pause/post-loading time as the Bridgewater-Brunel server, even for smaller maps. This suggests that, somehow, there is some aspect of the 64-bit Linux command line server that is not present with the 32-bit graphical Windows server that causes the excessively long loading times.

The next step is to profile a Linux command line server build running through the stages of connecting a client. I may well be able to do this at some point, but it would be very helpful if anyone else could have a go at this, as it is a very time consuming process to set everything up.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

DrSuperGood

Concerning the slow server load times. Have you tried disabling logging/debug levels for the server? I am guessing various stages of loading will emit stuff for logging, and due to the size of this map the emissions might slow down the loading process. Especially if the logs are being written to a file in such a way that causes the loading thread to block until completion, the IO write performance might greatly slow down the map load speed. This all depends on the volume of data being logged, if it is just a couple of lines it is unlikely to make a difference.

I assume the server is now running with the new sync model? If so, then results so far seem good.

Dwachs

Could be different log-level settings such that one of the server executable is spamming logs somewhere, while the other is not.
Parsley, sage, rosemary, and maggikraut.

jamespetts

Thank you both for your replies. (Dr. Supergood - I moved your post from the server thread to this thread as it was more relevant here).

The server is run with debug=0, but there is some logging of such outputs as there are at that debug level, but those are minimal. I did add a furhter optimisation yesterday, this time of map unloading times - I assume that this did not help?

Yes, the server is now running with Dr. Supergood's new sync model - I should be grateful for any further feedback on that. I am glad that it seems to be working well so far.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

DrSuperGood

QuoteYes, the server is now running with Dr. Supergood's new sync model - I should be grateful for any further feedback on that. I am glad that it seems to be working well so far.
I OOSed once today upon map reload from an auto save but I suspect that was from trying to modify a convoy schedule when it could not be modified. Standard has/had a similar issue which caused the game to go out of sync so I cannot relate that to the change. Another time it auto saved everything was fine.

jamespetts

Quote from: DrSuperGood on December 08, 2017, 06:58:56 PM
I OOSed once today upon map reload from an auto save but I suspect that was from trying to modify a convoy schedule when it could not be modified. Standard has/had a similar issue which caused the game to go out of sync so I cannot relate that to the change. Another time it auto saved everything was fine.

Splendid, thank you for confirming.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

DrSuperGood

If loading uses multiple threads then maybe servers are loading with just a single thread? When building a Windows command line server the other day (cannot make POSIX one anymore...) I had to turn off multi threading as otherwise the game tried to use graphic locks for drawing UI elements which does not exist with 0 graphics.

jamespetts

Thank you for your thought; however, the server is definitely running a multi-threaded build, as a single-threaded build would instantly lose synchronisation with a multi-threaded client, as the passenger generation multi-threading uses separate random number generator seeds for each thread.

I have also noticed the posix issues with Windows builds of the command line server: the game will instantly quit when started, with no error message.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

DrSuperGood

#9
The server load time is a real pain. I suggest logging the time taken for the different load steps. One can then get the log from a client (loads fast) and the server (loads slow) and see where the major differences are occurring. It might become obvious what the problem is that way.

jamespetts

Quote from: DrSuperGood on December 13, 2017, 08:22:51 AM
The server load time is a real pain. I suggest logging the time taken for the different load steps. One can then get the log from a client (loads fast) and the server (loads slow) and see where the major differences are occurring. It might become obvious what the problem is that way.

I intend to do this, but this has been significantly hampered by my inability to build a Windows command line server executable. I will have to try this on Linux, which means setting up a separate computer, which is time consuming, so I am focussing on other bugs at present.

If you or anyone else could run these tests, that would be exceedingly helpful, as my time is very limited and the rate of bug reports in the last few weeks has been grater than the rate at which I am able to code fixes. Also, I am about to go away for Christmas in a few days, which will mean that I will not have access to my normal development computer for several weeks; whilst I can undertake pakset work on my secondary computer (and intend to spend some time doing pakset work), I want to try to fix as many critical bugs as possible before I go away, so any assistance - even just in profiling the Linux command line server build while clients connect and posting the results - would be very helpful.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

DrSuperGood

QuoteI intend to do this, but this has been significantly hampered by my inability to build a Windows command line server executable.
What is the problem doing this? MSVC can build a command line server for standard no problem.

The process involves removing from the project/build the simsys_w and simgraph16 files and replacing them with simsys_posix and simgraph0. Then change build macros for color to 0 and change the link type to windows console.

I can try taking a look at the console server build problem, however it is always a pain to setup Simutrans to build.

jamespetts

#12
Ah, I think that I had omitted the last of those steps (linking as a console application). I am testing this now.

Edit Having tested, this seems to work, but the performance on a debug build of the Windows command line server, with the same saved game as is used on the Bridgewater-Brunel server, appears to be significantly in excess of that on the server itself, as the client is waiting less than 15 seconds (approximately) after it has loaded the game before the server has loaded the game.

Edit 2: I have now committed a small optimisation to the loading times, although this will only work when loading a game saved with the new version. I am not sure how much that this will help on the Linux server, but the profiler does show a small but measurable improvement in Windows.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

jamespetts

I have now committed some further optimisations to the loading routine, which my tests indicate should reduce loading time by as much as half. I am not sure, however, the extent to which this will assist with the as yet unidentified thing that is making loading much slower on the server than it is on the clients.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

DrSuperGood

Seems to have made very little if not no difference at all to the server.

Does this problem affect all Simutrans Extended Linux command line servers or just yours?

jamespetts

#15
I do not have comprehensive data on that, but it equally affected at least one other Linux server set up to test this very problem.

Edit: Taking advantage of the lack of use of the server caused by the earlier bug relating to passenger generation, now fixed, I compiled a profiling build of Simutrans-Extended on the Bridgewater-Brunel server, started it with the usual saved game, connected to it with a client, and then stopped the server shortly afterwards.

This is the profiling text generated by that run. I have yet to analyse this, but I should be interested in any views on what the issues might be here.

Edit 2: It seems that grund_t::calc_back_image() is taking a huge amount of computational effort on the server. I also notice that this version has changed in Standard recently (including replacing a number of static casts with plain casts). I am incorporating those changes into Extended to see whether this helps.

I have to say, it is odd that a non-graphical build should have to be calculating images at all. Does anyone know whether this method/function is actually necessary in a command line server build, or whether it can safely be set to do something trivial or even just "return" if the colour depth is defined as 0?

Edit 3: Tests on my Windows machine show that, on a command line server build, using the COLOUR_DEPTH preprocessor directive, it is possible to replace the entire contents of grund_t::calc_back_image() with "back_imageid = IMG_EMPTY;" with no adverse effect.

Edit 4: I have now restarted the server with this optimisation: I shall be interested to see whether it makes any difference.

Edit 5: Having tested it, it does not seem much improved, which is odd.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

jamespetts

#16
Carrying out some further testing on my Linux computer, the problem appears to be specific to the Linux command line build: on that machine, the graphical build will load a game very quickly when acting as a server, but the command line server version will take a very long time. I am trying to make the profiling work on that machine to attempt to understand why. It is notable that both the loading of the map initially and the unloading of the map before re-loading take a very long time.

Edit: Having spent some time compiling and profiling on my own Linux machine, the results are very odd. The results are here. The game was left to run for a short while after loading, but not for very long. Most of the time was spent loading the saved game. However, the profile mostly shows CPU activity in those areas relating to running games, such as those relating to float32e8_t (the second most called function/method), which relates to the vehicle physics.

I conclude from that that, whatever it is that is causing the program to run very slowly when loading and unloading saved games is not showing up in the time measured by the profiler. I further infer that the optimisations that I have carried out over the last month have indeed helped loading speed generally (which is demonstrated when I use profiling), but that they have not assisted with the specific issue that is causing extreme slowness when loading and unloading (but not saving) games using the Linux command line server.

I must confess that I am now rather at a loss as to what sort of thing could be causing this, and how it is possible for the cause of this not to be apparent in the profiler. This does not seem consistent with an I/O issue, as saving is not affected, but unloading the map is affected, which does not rely on I/O; for similar reasons, it cannot be a problem related to the compression algorithm.

If anyone has any ideas where even to begin to investigate this, I should be extremely grateful.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

DrSuperGood

QuoteEdit 2: It seems that grund_t::calc_back_image() is taking a huge amount of computational effort on the server. I also notice that this version has changed in Standard recently (including replacing a number of static casts with plain casts). I am incorporating those changes into Extended to see whether this helps.
That is because of profiling overhead and a lack of optimizations.
QuoteEdit 5: Having tested it, it does not seem much improved, which is odd.
Not really as you did not profile the optimized build. Many functions get inlined in optimized builds.
QuoteI conclude from that that, whatever it is that is causing the program to run very slowly when loading and unloading saved games is not showing up in the time measured by the profiler. I further infer that the optimisations that I have carried out over the last month have indeed helped loading speed generally (which is demonstrated when I use profiling), but that they have not assisted with the specific issue that is causing extreme slowness when loading and unloading (but not saving) games using the Linux command line server.
Exactly, which is why I never suggested running a profiler on it. You need to explicitly profile certain operations by using precision timers. This way the profiling still works in fully optimized builds and reflects the run time of the compiler optimizations rather than as barriers around function calls which themselves have significant overhead.

What you want to do is time certain stages of loading and compare the times with a normal client running on the system (which you say loads significantly faster). The stage that takes longer to run will be where the problem is.

The difference between the two builds is one uses simsys_posix (server) and the other one of the other simsys. Maybe the posix implementation has something that executes slowly?

jamespetts

When I first ran the profiler on the Bridgewater-Brunel server, resulting in the first linked output, I did run an optimised build, albeit I used a debug build on my own local Linux computer for the second output.

Looking at simsys_posix.cc, the only difference between the version currently in Simutrans-Extended and that in Simutrans-Standard is:


#ifdef _WIN32
int CALLBACK WinMain(HINSTANCE const hInstance, HINSTANCE, LPSTR, int)
{
return 0;
}
#endif


which cannot on any view affect performance in Linux, as the relevant part of the code would not even be compiled.

This is all very odd.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

DrSuperGood

QuoteLooking at simsys_posix.cc, the only difference between the version currently in Simutrans-Extended and that in Simutrans-Standard is:
Except we do not know if Standard suffers from the bug because no one has ever run a standard server as demanding as the extended ones.

I mean compare the difference between the posix and the SDL/GDI backends.

jamespetts

Quote from: DrSuperGood on December 19, 2017, 08:26:41 PM
Except we do not know if Standard suffers from the bug because no one has ever run a standard server as demanding as the extended ones.

Interesting. I wonder whether that might be worth trying?

QuoteI mean compare the difference between the posix and the SDL/GDI backends.

I am likely to struggle to do this in a meaningful way, as I do not understand most of this low level code or how it works: I have generally confined myself to altering the code that affects the simulation rather than the system interface. Do you know how any of this part of the code works?
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

TurfIt

Wild stab: You claim to have seen calculating ground images take a long time, and you have a bad merge from Standard in image.cc where the ground tile images are created, could have some unexpected side effects.  (Standard's code has a suspicious update here too - not freeing its temp arrays after their allocation was altered, but that would only waste some memory...)

Second: Extended seems to spew a great deal of crap out stdout/err. Loading a game with the console active takes forever as it can't keep up....

Finally: I doubt simsys_posix has an effect. Eliminate by compiling with it as the backend, but keep 16 color depth... (mod makefile as needed.)

jamespetts

Thank you for those suggestions. I have been looking into this issue, but have been somewhat distracted by the heap corruption issue and trying to find a solution to that.

Now that I think that I might have fixed the heap corruption issue (further tests pending), I can turn briefly to look at this again, although I go away for Christmas to-morrow, so will not have access to a machine with a graphical debugger for some weeks.

One thing that I do notice about this issue is that, judging by the debug output, there is a long delay between calls to init_threads() and destroy_threads(). This is curious, as the calls to these functions/methods are very close to each other indeed in karte_t::load():


void karte_t::load(loadsave_t *file)
{
char buf[80];

intr_disable();
dbg->message("karte_t::load()", "Prepare for loading" );
for (uint8 sp_nr = 0; sp_nr < MAX_PLAYER_COUNT; sp_nr++) {
if (two_click_tool_t* tool = dynamic_cast<two_click_tool_t*>(selected_tool[sp_nr])) {
tool->cleanup();
}
}
destroy_all_win(true);

clear_random_mode(~LOAD_RANDOM);
set_random_mode(LOAD_RANDOM);
destroy();

loadingscreen_t ls(translator::translate("Loading map ..."), 1, true, true );

clear_random_mode(~LOAD_RANDOM);
set_random_mode(LOAD_RANDOM);

// Added by : Knightly
path_explorer_t::initialise(this);

#ifdef MULTI_THREAD
// destroy() destroys the threads, so this must be here.
init_threads();
first_step = 1;
#endif


I am not sure what can be going in that short piece of code that is the trouble. Could it possibly be


loadingscreen_t ls(translator::translate("Loading map ..."), 1, true, true );

?

That seems an obscure candidate, but it might conceivably, I suppose, cause trouble with a console application?
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

DrSuperGood

Could this possibly be related to the old SDL2 slow loading bug I encountered in standard years ago? The GDI and SDL builds loaded maps extremely fast but the SDL2 build took forever to load maps, progressing the loading bar very slowly forward. I am unsure if it was ever fixed or what the cause was.

jamespetts

Quote from: DrSuperGood on December 20, 2017, 02:21:11 AM
Could this possibly be related to the old SDL2 slow loading bug I encountered in standard years ago? The GDI and SDL builds loaded maps extremely fast but the SDL2 build took forever to load maps, progressing the loading bar very slowly forward. I am unsure if it was ever fixed or what the cause was.

I have doubts that this is the issue, since (1) my local Linux machine uses an SDL2 build, and the performance there is good; and (2) the command line server build does not use SDL2.

However, I noticed that, on the server, the command line arguments -nomidi and -nosound were missing. I do not know whether this changes anything, but I have added these command line arguments to the server restart script, and the server should run with them when it next restarts.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

TurfIt

Quote from: DrSuperGood on December 20, 2017, 02:21:11 AM
Could this possibly be related to the old SDL2 slow loading bug I encountered in standard years ago? The GDI and SDL builds loaded maps extremely fast but the SDL2 build took forever to load maps, progressing the loading bar very slowly forward. I am unsure if it was ever fixed or what the cause was.
SDL2 did not like the entire screen being updated as frequently as the loading screen was trying; Loading screen was fixed to update only when it actually changed rather than redisplaying the same thing thousands of times.


Quote from: jamespetts on December 20, 2017, 12:59:25 AM
One thing that I do notice about this issue is that, judging by the debug output, there is a long delay between calls to init_threads() and destroy_threads(). This is curious, as the calls to these functions/methods are very close to each other indeed in karte_t::load():
And the entirety of the map destruction in between...


Quote from: jamespetts on December 20, 2017, 12:59:25 AM
That seems an obscure candidate, but it might conceivably, I suppose, cause trouble with a console application?
No.

jamespetts

Thank you for your feedback - that was helpful.

One odd thing that I notice is that, during the long pauses, CPU use (using the ps -uww command on the Linux machine) is very low: about 15%. I do not really know what to make of that.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

jamespetts

I have now attempted to run a side-by-side comparison on the same computer with the command line and graphical servers, using exactly the same command line arguments and exactly the same saved game (not that from the current Bridgewater-Brunel server, but the game that was used on that server for testing before this new game started).

The results are here for the command line version and here for the graphical version. It is hard to discern any meaningful difference between the two.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

DrSuperGood

The results are not meaningful as far as I can tell. Hence why I suggest adding time measuring code to various stages of the loading process and printing the results to console. For example measure the time it takes to destroy the existing world, the time it takes to load all of X type of object, the time taken to finish loading, etc. The difference in times is huge so the problem spot will have a very noticeably different time.

Once the problem spot is identified, one can work on finding the exact cause.

jamespetts

#29
Quote from: DrSuperGood on December 21, 2017, 07:54:26 PM
The results are not meaningful as far as I can tell. Hence why I suggest adding time measuring code to various stages of the loading process and printing the results to console. For example measure the time it takes to destroy the existing world, the time it takes to load all of X type of object, the time taken to finish loading, etc. The difference in times is huge so the problem spot will have a very noticeably different time.

Once the problem spot is identified, one can work on finding the exact cause.

That is a very helpful suggestion: thank you. I will look into that when I get a moment.

Edit: Thanks to this helpful suggestion, I have now found the cause of the problem, which I think ultimately originates outside Extended code, albeit this particular manifestation is Extended specific.

At line 2141 of simworld.cc, I have the command,


dr_sleep(100);


during the terminating of the threads. This is intended to make the program pause for 100ms to reduce the likelihood of thread deadlocks. Testing has found this to be quite effective, and the 100ms pause is barely noticeable.

However, for some odd reason, when this is compiled for Linux with a command line server build (i.e. with a posix backend), "dr_sleep(100)" causes the program to sleep for 100 seconds.

This inconsistency in the handling of dr_sleep may well cause problems in code that is common with Standard, such as:



        if(  env_t::networkmode  ) {
            process_network_commands(&ms_difference);

        }
        else {
            // we wait here for maximum 9ms
            // average is 5 ms, so we usually
            // are quite responsive
            DBG_DEBUG4("karte_t::interactive", "can I get some sleep?");
            INT_CHECK( "karte_t::interactive()" );
            const sint32 wait_time = (sint32)(next_step_time-dr_time());
            if(wait_time>0) {
                if(wait_time<10  ) {
                    dr_sleep( wait_time );
                }
                else {
                    dr_sleep( 9 );
                }
                INT_CHECK( "karte_t::interactive()" );
            }
            DBG_DEBUG4("karte_t::interactive", "end of sleep");
        }


where it is clearly anticipated that dr_sleep is denominated in milliseconds, not seconds.

Can anyone who is more familiar with the dr_sleep() function assist?

Edit 2: The relevant part of simsys-posix.cc appears to be here:


void dr_sleep(uint32 msec)
{
/*
    // this would be 100% POSIX but is usually not very accurate ...
    if(  msec>0  ) {
        struct timeval tv;
        tv.sec = 0;
        tv.usec = msec*1000;
        select(0, 0, 0, 0, &tv);
    }
*/
#ifdef _WIN32
    Sleep( msec );
#else
    sleep( msec );
#endif
}


The issue appears to be that "sleep()" in Linux is indeed denominated in seconds: see here.

Edit 3: An effective fix appears to be:


void dr_sleep(uint32 msec)
{
/*
    // this would be 100% POSIX but is usually not very accurate ...
    if(  msec>0  ) {
        struct timeval tv;
        tv.sec = 0;
        tv.usec = msec*1000;
        select(0, 0, 0, 0, &tv);
    }
*/
#ifdef _WIN32
    Sleep( msec );
#else
    usleep( msec * 1000u );
#endif
}
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

DrSuperGood

Can confirm server performance is drastically improved. Resumes near instantly after my client loads.