News:

Do you need help?
Simutrans Wiki Manual can help you to play and extend Simutrans. In 9 languages.

[BUG?] Error Joining Bridgewater-Brunel Server.

Started by DrSuperGood, May 11, 2019, 02:04:18 AM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

DrSuperGood

There was 1 active client on the Bridgewater-Brunel Server before I attempted to join.

Some time during the joining process, I think after file transfer but I am not sure, the following error message was generated in game.
QuoteProtocol error (expected NWC_GAME)
The joining process stopped and the demo map remained functional.

I am guessing the server crashed as a result since when I tried to join a few minutes later there were 0 clients listed. This join attempt did work and let me in game connected to the server.

Now I am not sure if this is just a manifestation of the out of memory problem the server has except this time during the joining cycle process or if it was an actual engine related bug. Hence I am reporting this in case other people encounter it or it becomes more frequent.

jamespetts

Thank you for the report. These things are very difficult to track down without a reliable reproduction case, but it is of some utility to know that they exist.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

freddyhayward

I recently ran into the same issue under similar circumstances. There was 1 active client when I attempted to join. If I remember correctly the error occurred shortly after "Server preparing game" was completed. I immediately tried to rejoin to which it showed "server did not respond!". A few minutes later there were 0 active clients shown.

freddyhayward

I've run into this error consistently since the last post, and it sometimes even occurs when there are no clients already connected. I would estimate that roughly half of my attempts to connect to the server have resulted in this. I also suspect that this problem is to blame for connected players having their games suddenly paused then disconnected. Finding and resolving the sources of the problem should be a very high priority.

jamespetts

Quote from: freddyhayward on May 30, 2019, 02:49:16 AM
I've run into this error consistently since the last post, and it sometimes even occurs when there are no clients already connected. I would estimate that roughly half of my attempts to connect to the server have resulted in this. I also suspect that this problem is to blame for connected players having their games suddenly paused then disconnected. Finding and resolving the sources of the problem should be a very high priority.

Unfortunately, without any way of reproducing this reliably, this will not be possible.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

jamespetts

I have tested connecting this evening. The first attempt crashed the server, and the second attempt succeeded. It is not clear from the system log what caused the crash.

However, I notice that the memory usage is extremely high on the server: Simutrans-Extended is taking ~95% of total RAM, and this is not including RAM taken by the web server, etc, although that will be small in comparison.

I also notice locally that memory usage when saving is greater than memory usage when running, exceeding 6Gb on my local system (although this is with all the graphics loaded, which is not done on the server). The server now runs with 8Gb of RAM, and I have also set up a large swap file. free-h gives the following output:


root@438242:~# free -h
              total        used        free      shared  buff/cache   available
Mem:           7.8G        7.5G        133M        1.7M        151M         64M
Swap:           17G        898M         16G


It is possible, but far from certain, that memory issues may be behind these problems. There is also the [rul=https://forum.simutrans.com/index.php/topic,19015.0.html]report relating to crashes when a client aborts connecting, but that was not relevant to the situation that I observed earlier this evening and described above, as I did not abort connecting.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Ves

Protocol error (expected NWC_GAME)
For the last couple of weeks I have occasionally also gotten this message, but anyway been able to log in some hours later. But now since a week ago, it has always been unsuccessfull and with this message generated.

When the error has been generated, and the initial demomap reappear, the server remains unresponsive for a good while according to the "play online" dialog.

Mostly when I have attempted to join it has reported zero clients online.

jamespetts

Thank you for letting me know - can I check whether others are able to connect?
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

freddyhayward

Quote from: jamespetts on June 11, 2019, 10:19:37 PM
Thank you for letting me know - can I check whether others are able to connect?
No, I am having the same issue. Either the server is unresponsive to begin with, or this error. This happens the vast majority of the time, roughly ~70% of attempts. When I am able to connect, within about 15-30 minutes the game inevitably pauses and remains paused. Especially frustrating was when I removed a rail bridge to upgrade its speed, which I assume has left my network in chaos.
A note on memory: I suspect that if you set up a patreon or other crowdfunding account, enough of us would contribute that you could upgrade your server plan to include more RAM than the current 8GB.

DrSuperGood

Not been able to connect to Bridgewater Brunel for several days now. Mostly does not respond. When it does it fails with protocol error and then goes down.

jamespetts

Thank you both for letting me know: I am sorry that this is not working well.

It is very difficult for me to look into this at present: I am waiting to upgrade my development computer, which I anticipate will be done next month or so. This computer is currently in a poor state and some of the memory modules no longer work, which means that it no longer has enough RAM to connect to the server game or run the saved game from the server.

I hope that the new computer will enable me to undertake development work more easily and efficiently.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

DrSuperGood

QuoteThis computer is currently in a poor state and some of the memory modules no longer work, which means that it no longer has enough RAM to connect to the server game or run the saved game from the server.
Have you tried re-sitting the DIMMs in their sockets (pull them out, check for dust and then place back)? It is very unusual for DDR3 or DDR4 memory to fail, compared with other parts of the system. Maybe re-socketing the processor might help in case the force applied to the socket has weakened over time causing a lose contact with one of the memory channels.

Or is the memory detected but is suffer from data corruption and they had to be pulled?

jamespetts

I have tried re-seating: it was after that that the system refused to use more than 8Gb of the 12Gb that it recognised. I am reluctant to spend too much time trying to fix this computer since it urgently needs replacement in any event; I cannot wait for Threadripper 3 whenever that may be released, but I will wait for Ryzen 3, which is coming out on the 7th of July.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

ACarlotti

The "Protocol error (expected NWC_GAME)" message is misleading, since this is triggered by not receiving the correct type of packet within 5 minutes. So in this case the issue isn't an incorrect packet, but rather that the server is taking over 5 minutes to prepare the game, at which point the client times out and aborts the connection process

Also of note is that the "Server preparing game..." loading bar displays how much of this timeout has passed, rather than reporting on actual progress by the server.

freddyhayward

Quote from: ACarlotti on June 18, 2019, 03:39:07 AM
The "Protocol error (expected NWC_GAME)" message is misleading, since this is triggered by not receiving the correct type of packet within 5 minutes. So in this case the issue isn't an incorrect packet, but rather that the server is taking over 5 minutes to prepare the game, at which point the client times out and aborts the connection process

Also of note is that the "Server preparing game..." loading bar displays how much of this timeout has passed, rather than reporting on actual progress by the server.
Perhaps then a short-term solution would be to avoid this behaviour and let "server preparing game" run indefinitely? This might prevent clients connecting and disconnecting which I assume crashes the server.

ACarlotti

Increasing the timeout would help part of the problem, but I feel uncomfortable about making it even longer - 5 minutes is already a long time to wait without knowing if the server has crashed. Perhaps this is a further indication that we should rethink saving the path explorer data - if we weren't trying to send over 800MB of data instead of <100MB, then this probably wouldn't happen.

Also, a wishlist feature: Have the server send updates on it's progress preparing (i.e. saving) the game. Or maybe even have it start transferring the game while still saving.

DrSuperGood

#16
The server should really start to transmit the save data as it is being written. This might drastically reduce loading times by almost completely masking save times.

In any case the reason it times out is not because of how big the save file is, but rather because the server is out of memory. Due to having to use the page file progress is very slow.

jamespetts

Thank you for the information on this.

I have for the time being increased the timeout from 5 to 10 minutes so that people can at least connect to the server game.

I am confused as to why so much memory is being consumed, however; but I cannot test properly on my local computer because it now has insufficient (working) memory to allow me to test this. The server shows the following for a simctrl status output:


INFO: Pidfile found, pid is a running process (pid: 24127)
Simutrans instance: brit is currently running with pid: 24127
Output of "ps uww -p 24127":
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root     24127 12.2 94.0 10175200 7691940 ?    Sl   Jun18 147:22 /usr/share/games/simutrans-extended/simutrans-extended -server 13353 -server_id 1825 -server_name Bridgewater-Brunel -server_comment British large maps, long-term play -debug 3 -log 2 -lang en -nosound -nomidi -objects Pak128.Britain-Ex


this for the output of free - h


              total        used        free      shared  buff/cache   available
Mem:           7.8G        7.4G        162M        3.0M        206M        119M
Swap:           17G        1.4G         16G


and this output of vmstat:


procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
0  0 1417324 166080  10176 201272   65  114   182   193    1    1 10  0 86  4  0


Until I can check this on my own computer, can I ask what memory usage people are experiencing on their local computers with this saved game? As reported above, when I last ran this, the memory usage locally was circa 6GB, which is less than reported being used on the server, despite the server not loading any graphics (or, if this has been coded correctly, although I am not sure how this is done, since it is unchanged from Standard, sounds).

As to the suggestions about making the process more efficient - these are interesting in principle, but somewhat beyond my level of expertise in that this is quite technical, low-level code. If anyone would like to look into this and add one of these features, that would be most helpful.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

freddyhayward

Loading a bridgewater-brunel save brings simuntrans to 8.7GB of RAM of the 16GB available on my laptop.

ACarlotti

So some good news (in the medium term, at least) - I've just profiled the save with valgrind, and I've discovered some places where significant memory savings can be made.

About a third of the memory is used by the path explorer - this might all be necessary; I haven't tried calculating how much is theoretically needed yet. I mention it just because it seems interesting.

Over 20% of the memory is allocated by lines 418 and 434 in simhalt.cc (both allocate just under 1GB). These are basically two massive objects that I think are stored one per halt. Most of this memory is actually for class/catg combinations that don't exist, so most of it (about 15% of the total memory usage) can be saved just by being more clever here. (Or actually, make that less clever, since the issue arises from an attempt to be clever folding 2D arrays into a 1D array for some reason).
Aside from this issue, a contributing factor is that an empty hashtable (of any data type) requires 2024B on a 64bit system (ignoring any padding; with padding I think it's 2428B), which is a lot of memory in which to store nothing. Both of these lines are allocating memory mostly for hashtables.

Improving both of these issues could potentially reduce the memory requirements for the Bridgewater-Brunel server by as much as 20% (and definitely more than 15%).

DrSuperGood

QuoteImproving both of these issues could potentially reduce the memory requirements for the Bridgewater-Brunel server by as much as 20% (and definitely more than 15%).
This would potentially go a great way at improving the server and even client performance.

With the hashtables one might be able to add a special case where empty hashtables do not allocate bucket storage. The allocation then occurs when the first item is added to the hashtable. Depending on how many almost empty hashtables (few mappings contained) one could further optimize memory usage by resorting to a linear search through an array list. For example when less than 4 mappings use a the linear search through array list which would potentially require a lot less storage with minimal performance impact.

jamespetts

Thank you for the analysis: this is helpful.

It was I who used the one dimensional arrays on the advice of a website the details of which I have now forgotten in an attempt to make them more efficient and reliable. They are at least reliable, but I see that memory usage suggests that they are not very efficient.

Does anyone have any suggestions as to how these data structures can be reformed? It may be a considerable time before I am able to consider working on this myself, but if anyone else would like to work on coding this, whether now or in the future, that would be much appreciated (although I anticipate that much testing would be needed before any deployment of such modified code).
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

ACarlotti

I would probably go with a vector_tpl of vector_tpls if the overhead of extra bounds checking won't be a problem. If it will be, then something similar to the current setup but without the unused spaces would be better (though slightly less readable than vector_tpls). I haven't yet looked at what all the use cases are. I'll hopefully have time to look into this by the weekend.

jamespetts

Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

ACarlotti

Right, I think I've written all the code and it seems to run ok, but I need to do a bit more testing later when I have time (and my laptop charger). I've switched both waiting_times and connexions to use a vector_tpl or vector_tpls of hashtable pointers (or pointers to a struct containing a hashtable). It seems to load, run and save correctly, although I'm suspicious of how long it took to save. I also want to rerun valgrind to verify that it's had the expected effect on memory consumption. Memory usage seems to be down to around 7.4GB, which should at least give us at least another month or so before the server runs out of memory again.

As a side-effect, get_connexions no longer needs to be passed max_classes as an argument (which I'm pleased about because callers really shouldn't have needed to compute max_classes). This has probably fixed a bug in the halt list filter for stops with no connections (previously it would unintentionally check only the first three categories in pak128.Britain).

ACarlotti

#25
I've now tested it more thoroughly (and fixed an vector out-of-bounds error that I hadn't triggered earlier). Valgrind reports that memory usage in each of the corresponding new lines of code is about 300MB or 4% of total, which corresponds to a reduction of about 1.4GB memory usage. Peak allocated memory was 7.0GB according to valgrind, although I forgot to trigger the path explorer on this instance, which would probably increase it a bit more. On a separate run, top was reporting 6.8GB of physical memory being used by Simutrans Extended.

So I think this is a substantial improvement on what was there before. One thing that I didn't mention in the previous post is that I have also created a new function goods_manager_t::get_classes_catg_index() to handle the common case of having to compute the number of classes used for a given goods category index. (This is the number that differs for each special freight good, rather than the pakset allocated number).

Incidentally, I noticed that saving the game was taking about 3.5 minutes on my laptop, so the server will presumably still take a long time to save even without the memory overconsumption.

EDIT: The path explorer was already running, so the memory usage was accounted for. What I hadn't done was allow for a full cycle of the path explorer.

The code is on my Github master

jamespetts

Thank you very much for this; I have now incorporated this after some basic testing. That is most helpful.

Since I will not be able to run the server game myself for a while, I should be grateful if others who play on that game could report their experiences after the release of to-morrow's nightly build to see how this improves the joining experience.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

freddyhayward

Quote from: jamespetts on June 22, 2019, 12:24:50 PM
Thank you very much for this; I have now incorporated this after some basic testing. That is most helpful.

Since I will not be able to run the server game myself for a while, I should be grateful if others who play on that game could report their experiences after the release of to-morrow's nightly build to see how this improves the joining experience.
I'm sure you're already aware but for the sake of certainty the server has been offline for the past 60 hours or so.

jamespetts

It is back online now; I suspect that the problem was the save format bug, now fixed.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

jamespetts

Logging into the server's SSH (command line interface), I see that memory usage is still at 94.5%, with 7.5GB of 7.8GB of RAM used, with an additional 628MB of swap space used (of a total allocated swap file size of 17GB).

I see that somebody is currently connected to the server. May I ask how the in-game performance is?
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

ACarlotti

Two more things that I could do to reduce memory usage concern the table of stored path explorer data. Currently each entry is a 32-bit journey time, a 16-bit next transfer halt id, and I think 16 bits of padding for alignment. Eliminating the padding would save about 500MB, and changing the stored journey time to a floating point (8 bit mantissa; 8 bit exponent) would allow another 500MB to be saved (and immediately solve the alignment issue) while having a negligible impact on whether passengers choose to travel or not. (It might even be possible to squeeze this down to an 8 bit float and an 8 bit reference to a separate halt lookup (for most halts with <256 connections), but that would have more of an impact on performance and accuracy and wouldn't reduce memory usage by as much).

There are also potential savings to be made by reducing the memory consumption of an empty (or small) hashtable, but that would only save around 300MB.

I probably won't have enough time to do these for at least two weeks, and possibly four, but hopefully the server can cope with the current memory demand for that long. I expect I'll be able to implement the path explorer space saving by the end of July.

jamespetts

That is very interesting and helpful - thank you. However, do bear in mind that we cannot use built-in floating point types other than in the GUI in Simutrans-Extended, as these are not network safe (i.e., they will deal with rounding inconsistently on differnet platforms, causing loss of network synchronisation when two different platforms connect). There is a Simutrans fixed point type (float32_e8, if I recall correctly where the underscore goes), which is used for vehicle physics calculations, but this is very slow compared to the standard floating point types and may be less memory efficient, too.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

freddyhayward

I was on the server at one point with two other players. Eventually I did start to see long pauses and desyncs but I was usually able to rejoin shortly after.

jamespetts

Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

ACarlotti

Quote from: jamespetts on June 23, 2019, 03:45:16 PMHowever, do bear in mind that we cannot use built-in floating point types other than in the GUI in Simutrans-Extended, as these are not network safe (i.e., they will deal with rounding inconsistently on differnet platforms, causing loss of network synchronisation when two different platforms connect). There is a Simutrans fixed point type (float32_e8, if I recall correctly where the underscore goes), which is used for vehicle physics calculations, but this is very slow compared to the standard floating point types and may be less memory efficient, too.

I am aware of these issues. I would implement it as a new custom type, and I think the only operations that would be needed are conversion to this type, and comparisons (and the comparisons could be made by comparing the structures as if they were integers). This is, of course, dependent on whether I find this data being used anywhere else.