News:

Simutrans Wiki Manual
The official on-line manual for Simutrans. Read and contribute.

Server performance: preliminary findings

Started by jamespetts, December 23, 2019, 11:51:35 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Mariculous

#35
Let's start slow and get faster.
I did all of these in ram, so read/write will not be limited by the HDD. CPU was an i7 6700HQ

gzip fastest: ~1,3G

time gzip -1 bb-17-jan-2020.sve.uncompressed
real    0m54,607s
user    0m53,413s
sys     0m1,168s



time gunzip bb-17-jan-2020.sve.uncompressed.gz

real    0m29,111s
user    0m28,022s
sys     0m1,088s


lz4 fastest: ~2,0G

time lz4 -1 ./bb-17-jan-2020.sve.uncompressed

real    0m14,402s
user    0m10,699s
sys     0m1,420s



time lz4 ./bb-17-jan-2020.sve.uncompressed.lz4

real    0m35,982s
user    0m4,392s
sys     0m1,291s

Note that I was trying to achieve the same compression level as gzip -1 but it is totally not worth!
lz4 -7 was around ~1.5G, whilst already above 2 minutes compression time.

zstd fastest: ~1,1G

time zstd -1 ./bb-17-jan-2020.sve.uncompressed

real    0m15,672s
user    0m16,081s
sys     0m0,899s



time zstd -d ./bb-17-jan-2020.sve.uncompressed.zst

real    0m11,133s
user    0m8,162s
sys     0m1,088s


pzstd 2 threads: ~1,3G

time pzstd -1 -p2 ./bb-17-jan-2020.sve.uncompressed

real    0m11,082s
user    0m16,794s
sys     0m1,139s


time pzstd -1 -d -p2 ./bb-17-jan-2020.sve.uncompressed.zst

real    0m6,517s
user    0m8,834s
sys     0m1,899s



pzstd 4 threads: ~1,3G

time pzstd -1 -p4 ./bb-17-jan-2020.sve.uncompressed

real    0m6,582s
user    0m18,278s
sys     0m1,399s


time pzstd -1 -d -p4 ./bb-17-jan-2020.sve.uncompressed.zst

real    0m4,334s
user    0m9,052s
sys     0m2,096s


Conclusion: pstd is quite fast and has a good compression rate, so we should give it a try.

freddyhayward

How is memory usage for all of these options - would the server be affected by additional swapping?

Mariculous

#37
Good point, I'll have a look at it. I expect gzip to consume the fewest memory, however none of these should have a significant memory overhead.

Note pzstd will add information required for parallel decompression to the compressed files, whilst zstd used with the -T will compress in parallel but won't add that kind of information, so decompression won't profit from multithreading.
However, the pzstd approach does not seem to be part of the lib itself, so its resuls are missleading.

Here is zstd -T: ~1,1G
time zstd -1 -T2 ./bb-17-jan-2020.sve.uncompressed

real    0m11,130s
user    0m17,928s
sys     0m0,981s


time zstd -1 -T4 ./bb-17-jan-2020.sve.uncompressed

real    0m6,586s
user    0m18,287s
sys     0m1,139s


time zstd -d -T4 ./bb-17-jan-2020.sve.uncompressed.zst

real    0m10,966s
user    0m7,833s
sys     0m1,144s

jamespetts

Freahk - this is very interesting, thank you. I should indeed like to see memory consumption, too.

Dr. Supergood - it would indeed be splendid if someone had the time and knowledge to write code for streaming the saved games. Would anyone like to volunteer?
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Mariculous

James, do you have elder versions of the bridgewater game?
Zstd can be trained to a specific filetype to perform even better and get larger compression levels. I am really interessted in trying this out for extended saves.

jamespetts

How much earlier were you after? I have one from August here.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Mariculous

I guess how long does not matter as long as files structure remains the same, so I would expect it to work best with relatively recent saves e.g. learning from yesterdays (uncompressed) savegames to compress todays savegames.

However, that's just an idea. I don't know if training will give any advantage for such large files but without trying we won't know.

jamespetts

Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

DrSuperGood

Quote from: jamespetts on January 18, 2020, 12:08:00 AMDr. Supergood - it would indeed be splendid if someone had the time and knowledge to write code for streaming the saved games. Would anyone like to volunteer?
The changes needed are not trivial. The entire net code part is a mess with respect to such features. Ideally it should be made asynchronous however this is difficult due to standard's requirements to support single thread only builds. This means one cannot use dedicated receive and transmission threads, instead having to rely on polling.

While at it one could also implement other features... Like more than 1 player joining from the same save file. Possibly even allowing the game to progress during the transfer depending on server administrator preferences.
Ability to query server status asynchronously. Currently unresponsive servers cause the client to freeze until a timeout occurs.

The solution is not as simple as piping/nesting the compression results via socket since the server also has to write out the results to file. While saving it would have to read this written out data and send it to the client(s) that are joining, which would either require another thread to perform or greatly complicate the compression code, at least tying it in directly with net transfers.

jamespetts

 
Quote from: DrSuperGood on January 18, 2020, 01:16:37 AM
The changes needed are not trivial. The entire net code part is a mess with respect to such features. Ideally it should be made asynchronous however this is difficult due to standard's requirements to support single thread only builds. This means one cannot use dedicated receive and transmission threads, instead having to rely on polling.

While at it one could also implement other features... Like more than 1 player joining from the same save file. Possibly even allowing the game to progress during the transfer depending on server administrator preferences.
Ability to query server status asynchronously. Currently unresponsive servers cause the client to freeze until a timeout occurs.

The solution is not as simple as piping/nesting the compression results via socket since the server also has to writing out the results to file. While saving it would also have to read this written out data and send it to the client(s) that are joining.

Indeed this is not trivial - this is why I asked whether there are any volunteers, since I certainly will not have the time to implement such non-trivial changes for the foreseeable future given the almost unimaginably long list of higher priority items.

As to the need to support single threaded builds - these are sometimes useful for debugging purposes, especially detecting whether a loss of synchronisation in an online game is caused by a multi-threading issue. However, if there were to be a massive gain from such a feature, and someone were only realistically able to code it within the time available by mandating multi-threading, then it may be worth dropping this requirement.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Mariculous

Quote from: jamespetts on January 18, 2020, 01:08:58 AMIs the one from August any use?
One alone is not no. It would require a few to learn from the samples.

TurfIt

#46
Terrible table, but my results:


typeleveltime (s)size (MiB)
bin---16.94021
zipped6 (default)225.31154
6f249.91334
6h56.12095
6R54.32078
6F227.81422
6 64K buf220.91154
370.01298
145.61319
1 64k buf42.01319
bzip29,30380.6953
9,150429.6953
1,30307.5984
zstd6 (default)26.41143
122.01144
326.51143
775.51040
890.01006
9130.01008
-123.61143
-218.21409
-317.51478
1 64K20.31144
1 256K19.11143

Taken from ingame since compressing a stream is rather different than taking a command line tool to an existing file...

Zstandard would be a good choice to change to. Same filesize for 1/10 the time. And only 20mins to hack it into the Simutrans code thanks to a provided zlib compatibility shim (and no thanks to the documentation of that shim which is .....) Parallel zstd does not have compatibility shims - would take hours/days to do anything there IMO. And you're limited by the 17 seconds it takes to serialize the data anyway.

Both zipped and zstd like having their buffer up a bit from the 8K default:

fd->gzfp = gzopen(filename, "wb2");
gzbuffer(fd->gzfp, 65536);


Random thoughts:
   The default zipped is taking *only* 225 seconds. >> Bridgewater brunnel must be running on a potato to be timing out clients at 10 mins and still not sending the file...
   Once loaded, the game runs at half speed. >> then how glacial it must be running on the server...
   Bridgewater serves up the savegame from its http server quite slow too - 10Mbit/s average, peak 15 ish, many extended slows to < 5.  Spending extra CPU for better compression is very much warranted at these slow transfer rates.
   WhyTF does Extended crash upon loading the pakset if you've moved the mouse while it's loading?? ?? What sorta bug you got in there...



DrSuperGood

Quote from: Freahk on January 18, 2020, 02:43:18 AMThe default zipped is taking *only* 225 seconds. >> Bridgewater brunnel must be running on a potato to be timing out clients at 10 mins and still not sending the file...
This has been raised a lot. It is very low on memory so the server performs very poorly due to page faults. I would not be surprised if saving causes page thrashing.
Quote from: TurfIt on January 18, 2020, 04:39:44 AMWhyTF does Extended crash upon loading the pakset if you've moved the mouse while it's loading?? ?? What sorta bug you got in there...
Standard could suffer the same bug just pak128 Britain extended is probably the largest pakset Simutrans has ever seen so the first to observe it. It could also be a SDL2 related bug.

jamespetts

TurfIt - thank you very much for that: that is most interesting. Are you able to share the code that you used to integrate zstd?

In respect of the Bridgewater-Brunel server, it is indeed low on memory, which I suspect is why the saving is taking even longer than indicated in your tests; the running game is not as slow as the slowness of the saving would suggest.

As to the crashing on mouse movement, I have never seen this and it has never been reported before. Moreover, I cannot reproduce this. Are you able to give, preferably in a new thread, detailed steps to reproduce this reliably?

Also, Freahk - how many would you need?
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

prissi

If I read some more on zstd, It seems that the zstd can read also zlib file, so it would have backward compatibility without any extra effort. That is nice.

I wonder though how TurfIt gets more than four times of the datarate on writing binary files. Does experimental dump large tables at one point?

jamespetts

Quote from: prissi on January 18, 2020, 01:43:16 PM
If I read some more on zstd, It seems that the zstd can read also zlib file, so it would have backward compatibility without any extra effort. That is nice.

I wonder though how TurfIt gets more than four times of the datarate on writing binary files. Does experimental dump large tables at one point?

I suspect that it does: the routing data in large games can be very large, and these are in hashtables.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Mariculous

#51
Quote from: jamespetts on January 18, 2020, 11:41:22 AMAlso, Freahk - how many would you need?
I don't know exactly but a few, ~5-10 should give us an idea if it's worth further investigating zstd training for our purposes or not.

jamespetts

How different from each other would they need to be; would a set of backups a few hours apart work?
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Mariculous

QuoteHow different from each other would they need to be; would a set of backups a few hours apart work?
The idea of training is detecting typical patterns within files of a given type, so I guess a set of backups would work perfectly well as long as there were players actually playing the game at that time.

Otherwise, such a comparisation would be quite pointless. It would be kind of "calculate the best way to compress a set of 95% simmilar files", "now use the gathered information for another file that is by 95% just the same".

jamespetts

If it is of any help, have a look at the games currently in the saves directory of the Bridgewater-Brunel fileserver here. You will see that the largest of them (order by size to see this easily) are from the latest server game. Is this of any help? I have not yet installed FileZilla on my new computer, so it may be a while before I can easily manipulate files on the server at present.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Mariculous


jamespetts

Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

TurfIt

Quote from: jamespetts on January 18, 2020, 11:41:22 AM
TurfIt - thank you very much for that: that is most interesting. Are you able to share the code that you used to integrate zstd?

Quick and dirty patch attached. Most of the makefile changes are just to get it to compile, you have it very corrupted (wrong logic and TABS!!!). I did not fix it 'cleanly'.
The zlibwrapper directory is straight from the zstd source, with some extra headers mushed into one directory. Again very not 'clean'. But good enough for the test at hand.


Quote from: DrSuperGood on January 18, 2020, 08:04:28 AM
This has been raised a lot. It is very low on memory so the server performs very poorly due to page faults. I would not be surprised if saving causes page thrashing.

This game requires 8.5GB on a server (for the Simutrans process alone, OS overhead extra). If the Server only has 8GB then that's the first thing that needs "fixing". While zstd is still frugal memory wise, especially at low compression settings, it does still add to the memory footprint, and might make things even worse. Swapping == death.


Quote from: prissi on January 18, 2020, 01:43:16 PM
If I read some more on zstd, It seems that the zstd can read also zlib file, so it would have backward compatibility without any extra effort. That is nice.

Yes, it can read and write zlib files.  A single function call to select zstd or zlib for writing if desired.


Quote from: prissi on January 18, 2020, 01:43:16 PM
I wonder though how TurfIt gets more than four times of the datarate on writing binary files. Does experimental dump large tables at one point?

Probably by not using a laptop; You posted having an i7-7500 which doesn't exist, so I'm assuming actually an i7-7500U which is a low power laptop chip. Laptops also tend to have slow memory (i.e. sticking with spec which is a mere ddr-2133 for that 7500U [or even ddr3-1600 is supported and a cheap laptop manufacturer might sneak that in there!]) and memory speed plays a huge factor in compression and even in running Simutrans in general which is memory latency bound.

For reference I did these tests on a Ryzen 7 2700X and mine has crippled memory support so stuck at DDR4-3066 (it'll do quite tight timings at that, just will not do any more frequency). All in memory too - ramdrive.

Experimental's save time problem is entirely due to saving the path explorer data, something that doesn't exist in Standard. i.e your change to wb1 is counterproductive IMHO unless you expect all clients to be able to download at 100Mbit/s from all servers with servers running on poor (laptop) CPUs...

The Bridgewater brunnel game without path data is 677MB, which compresses to 137MB in 16.4 seconds with the default zipped. It's using 1.5 cores doing so with the limited multithread.  With the path data, there's an extra 3344MB to save taking 185 seconds - CPU usage drops to a single core since zip is choking on this bitstream. Less than half the compression rate. Zstandard seems to handle it fine, otherwise I'd have suggested to reassess the structure of this data.

---
I also tried a maximum compression with lzma2. 16 threads, ultra level, 192MB dictionary. Takes the same 225 seconds as zipped but produces a file half the size at 620MB. But, consumes a whopping 21GB ram just for the compression doing so!

jamespetts

Thank you for this - that is most helpful. Can I check what dependencies are required for this in both Linux and Windows?
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

TurfIt


jamespetts

Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

prissi

#61
TurfIt, most servers for standard games are really weak (1 core Vservers, at least in my case). Otherwise bz2 would not be off by default for servers. Most clients will at least has a faster download than 10 MBit/s nowadays. Moreover, the switch (zipped/bzip) in the simuconf.tab is exactly to have a choice between speed and size.

(My laptop was the most expensive model from Panasonic in 2017, about $3000. CPU is special made model with 6 cores and 2.9 GHz nominal which does not exist in databases. Resource monitor has 6 threads up to 3.6 GHz speedup. So I was a little surprised about a factor of four. But this gets offtopic fast.)

Btw. the Debian package name is libzstd-dev and it is already installed with pacman on mingw64 it seems.

Phystam

Even if you want to cross-compile the library, you can easily do it since the library does not have any dependencies. It was very easy to introduce it.

jamespetts

Dr. Supergood - can I check whether you are still interested in working on this? It would help me to know whether to spend my time on this in the near future or whether to spend the same time working on bug fixes and other enhancements.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

jamespetts

I have been looking at attempting to implement Zstd this evening, but cannot find any precompiled libraries for Visual Studio, nor a Visual Studio build file. Do I assume that the only way to build this for Windows without taking a very large amount of time is to use Cygwin or similar? Has anyone tried compiling with this in Windows?
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Phystam

I usually use Windows subsystem for linux (WSL) and the minGW compiler set. At least I have succeeded to compile the Zstd library.

DrSuperGood

Quote from: jamespetts on January 19, 2020, 05:39:56 PM
Dr. Supergood - can I check whether you are still interested in working on this? It would help me to know whether to spend my time on this in the near future or whether to spend the same time working on bug fixes and other enhancements.
Currently it does not seem worth while to spend time on changing the compression. It only became an issue because of how large the server game is that the server ran out of memory so any kind of compression would still be very slow. The better solution, involving parallel transfer and compression, is not trivial to implement and something to do more towards a polish phase of development or when there are not important features to implement. It is also something that would be worth while to implement in standard as well for similar reasons.

Parallel saving and transfer of save file data would effectively eliminate the compression time altogether for most algorithms since the load bottleneck becomes the upload speed of the server or download speed of the client. One could even use slower/better compression then since as long as the compression speed is faster than the server upload rate, any reduction in file size represents a reduction in load times.

jamespetts

It probably is worth spending time to do this, since even in single player mode, the amount of time spent saving the current Bridgewater-Brunel saved game is unreasonable, whereas it is reasonable when compression is entirely disabled. Thus, the slowness on the Bridgewater-Brunel server is not exclusively down to it being low on memory, although this very probably makes it worse.

It is of course up to Dr. Supergood what he would like to spend his time doing; but I will proceed when I am able to integrate this, if I can overcome the library issue.

Incidentally, I notice that it is possible to compile the Zstd library so as to enable multi-threading, which would on the face of it seems to render pzstd redundant.

Phystam - did you enable multi-threading when you compiled the library? This would seem to be a worthwhile thing to do.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

DrSuperGood

Quote from: prissi on January 19, 2020, 10:54:50 AM(My laptop was the most expensive model from Panasonic in 2017, about $3000. CPU is special made model with 6 cores and 2.9 GHz nominal which does not exist in databases. Resource monitor has 6 threads up to 3.6 GHz speedup. So I was a little surprised about a factor of four. But this gets offtopic fast.)
I assume it is still an Intel CPU? That is likely an AMD vs Intel issue.

AMD processors such as used by TurfIt generally score higher in compression/decompression tests than comparable Intel desktop ones. AMD heavily optimizes for this kind of workload, for some reason. Zen2 (the current generation) took this to a new level since as an example the Ryzen 5 3600 punches near similar performance in these tests to a Core I9 9900K despite having lower clocks, a lot lower power limit, 2 fewer cores and costing under half the price.

Assuming compression takes long enough that boost duration expires on the Intel system then that could be 3x performance from clock and better compression IPC and an extra times from better memory bandwidth and latency.
Quote from: jamespetts on January 25, 2020, 06:35:54 PMIt is of course up to Dr. Supergood what he would like to spend his time doing; but I will proceed when I am able to integrate this, if I can overcome the library issue.
So to get the task straight. The existing zip library (zlib?) is to be entirely replaced by Zstd which is to operate in multi threaded mode? And the changes have to be applied to both makefile (for Linux) and VisualStudio?

jamespetts

Quote from: DrSuperGood on January 26, 2020, 03:26:32 AM
So to get the task straight. The existing zip library (zlib?) is to be entirely replaced by Zstd which is to operate in multi threaded mode? And the changes have to be applied to both makefile (for Linux) and VisualStudio?

Yes, for the most part: TurfIt has done much of the work with the patch (assuming static linking, on which see below), but I have so far not been able to set myself up to compile the necessary libraries for Windows at least. I have not yet tried with Linux yet. Ultimately, this will need to be able to be compiled in three ways: (1) Windows native with Visual Studio for development/debugging; (2) Windows cross-compile with GCC (for building on the server); and (3) Linux native with GCC (for building on the server).

If at all possible, the library will need to be compiled with the multi-thread option enabled, as this could make a big difference to performance as suggested by the tests of pzstd earlier in the thread.

One thing that will need to be checked, which I have not yet had time to review, is licence compatibility. Simutrans (and thus Simutrans-Extended) uses a weird licence, the Artistic Licence 1.0, as was specified by Hajo back in 2004. This is not compatible with the GPL. Zstd is licenced under both the GPL v. 2 and the BSD licences. I have not yet checked the compatibility of the latter.

If the licence is incompatible, we will not be permitted to incorporate this library by static linking (as I believe is the implementation in TurfIt's patch), but will need to use dynamic linking, so a .dll file will need to be generated for Windows builds, and whatever the equivalent of that is for Linux builds, and this, together with a copy of the GPL and a link to the sources, will need to be distributed with the executable builds.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.