The International Simutrans Forum

 

Author Topic: Server performance: preliminary findings  (Read 2138 times)

0 Members and 1 Guest are viewing this topic.

Offline prissi

  • Developer
  • Administrator
  • *
  • Posts: 9709
  • Languages: De,EN,JP
Re: Server performance: preliminary findings
« Reply #70 on: January 26, 2020, 12:32:07 PM »
I think the biggest speedup is from comparing standard to experimental. If you write large hashtables in experimental, cash misses are way less frequent compared to standard when the data delivery rate is limited by collecting the objects from everywhere in memory.

To the task at hand:
If you can live with 10-20% larger size but still want maximum speed, just change the modefier string from "wb" to "wb1" in gzopen. It compresses as almost as fast as Simutrans can create the binary data but is 25% larger than zstd compression. The games keep compatibility and the time can be used for other stuff. (Also it would avoid all the hassle with requiring a self-compiled library on macintosh, with MSVC, and on ARM systems. This is why I did not considered it for standard right now.)

The library TurfIt uses in not multithreaded, as he wrote. Anyway, it is not needed too, since apparently Simutrans cannot deliver the data as fast as zstd can compress single-threaded. And you still need zlib to read old savegames in the zlib wrapper. You just need to copy the zlib-wrapper files from zstd. (Not sure, if you can automatically pull only them into that directory.)

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 19273
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Server performance: preliminary findings
« Reply #71 on: January 26, 2020, 02:23:07 PM »
Prissi - thank you for that information. I have tried experimentally modifying "wb" to "wb1", but, when I do so and when I try to open the Bridgewater-Brunel saved game from December, I get an assert failure in some external code, the substance of which seems to be, "Invalid file open mode", 0. Trying to continue after this causes a segfault in loadsave.cc.

Can I check whether I have done this correctly - should I change all instances of "wb" to "wb1" in loadsave.cc, or only some of them?

Offline TurfIt

  • Dev Team, Coder/patcher
  • Devotee
  • *
  • Posts: 1335
Re: Server performance: preliminary findings
« Reply #72 on: January 26, 2020, 05:13:39 PM »
The zstd hack patch I'd posted already has zlib set to level 1. You can look there for what to change. It's not important to the zstd integration, just a left over from the timing tests. And those timing test show clearly zstd produces the Extended savefile from Bridgewater server 90% the size of zlib'1' in 1/2 the time.  Very much worth it for Extended while saving the routing data which Standard does not have, and the zlib does not like. I've not seen a save from Standard that's worth changing from zlib '6' default. You at best save a few seconds on compressions, and spent a couple extra minutes on file xfer. Similarly it's not really worth the hassle of another library in Standard since you save a few seconds at best.

I was going to mention the licensing, but decided to ostrich instead.  Unless we have some laywer join the team... it's a mess. From what I see, the Artistic license was deemed too vague, and hence is blackballed. So all the other libraries being used in Simutrans are in a grey mess too. Also note, we're trying to use the zstd wrapper here - actually including their source files directly, not just using the zstd library.

The zstd project comes with VS project files per the README, are those not sufficient to build a library? I have no experience with VS.

Offline prissi

  • Developer
  • Administrator
  • *
  • Posts: 9709
  • Languages: De,EN,JP
Re: Server performance: preliminary findings
« Reply #73 on: January 27, 2020, 09:23:21 AM »
Proper zstd support was very easy to add, without playing with the wrapper in the first place. I will submit it for standard today. Actually, the buffer system is already in place, so one had just to discard some old stuff with mixed read and writes and just use consequently lsgetc() and read() from loadsave_t. libzstd builds straight forward in MSVC.

At higher compression the standard game size seems even smaller with zstd than bz2 by about 5%, and of course saving and loading is much faster than bz2 (with single threaded library, but compression in background thread). More quantitative data will follow.

Offline prissi

  • Developer
  • Administrator
  • *
  • Posts: 9709
  • Languages: De,EN,JP
Re: Server performance: preliminary findings
« Reply #74 on: January 27, 2020, 02:29:14 PM »
Below is a patch for loadsave_t (which apply without too many problems on experimental, I guess). This is using the native zstd API, so no source from zstd project needs to be included.

However, I did some tests and zstd was really disappointing. It is in default setting (3) for standard Simutrans games same speed as bzip2 but larger, (or at 9 compression level almost on par but way slower). One the other hand, zlib "wb1" is still larger but much much faster.

My times
zlib load world 2902 ms (zlib default 3012 ms)
bz2 load world 3079 ms
zstd9 load world 3085 ms
zstd3 load world 2996 ms

(same time without threading (like on simple server): load is dominated by object allocation.)
zlib1 save world 910 ms size 5666 kB (zlib default 1260 ms, size 4644 kB)
zstd9 save world 14107 ms 3298 kB
zstd3 save world 2962 ms 4432 kB
bzip2 save world 2239 ms 3177 kB

My conclusion for standard: zstd might be much faster on decompression, but that is not the limiting step. On compression it is much slower than zlib "wb1" and even default zlib, and not generating smaller files than bz2 nor much smaller than default zlib. Maybe the standard savegames are too small, or I am too stupid to do it correctly.
« Last Edit: January 27, 2020, 11:33:53 PM by prissi »

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 19273
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Server performance: preliminary findings
« Reply #75 on: January 27, 2020, 03:14:05 PM »
This is interesting - thank you. I suspect that TurfIt may be right that the pathing data, when saved in Extended, does not work well with Zlib, so this may well make a substantial difference to Extended even if not to Standard.

I have been looking into the licence situation. ZStd is dual licensed - under the GPL v. 2.0 and the BSD modified licence. The former is known to be incompatible with the Artistic Licence 1.0. However, the latter is compatible. The terms of the licence are very short and simple, and I reproduce them below in full:

Quote from:
The licence
BSD License

For Zstandard software

Copyright (c) 2016-present, Facebook, Inc. All rights reserved.

Redistribution and use in source and binary forms, with or without modification,
are permitted provided that the following conditions are met:

 * Redistributions of source code must retain the above copyright notice, this
   list of conditions and the following disclaimer.

 * Redistributions in binary form must reproduce the above copyright notice,
   this list of conditions and the following disclaimer in the documentation
   and/or other materials provided with the distribution.

 * Neither the name Facebook nor the names of its contributors may be used to
   endorse or promote products derived from this software without specific
   prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Thus, what we need to do is make sure that we include the above text in both source and binary distributions; provided that we do that, we will be compliant. I suggest that this BSD licence be added to the Standard sources and binary distributables once the ZSTD be integrated, if this integration is to occur.

See here for more information on the BSD licences.

This also means that we do not need to look into the more complex process of dynamic linking as would have been necessary if this were available only under an incompatible copyleft style licence.

Offline Phystam jp

  • *
  • Posts: 380
  • Pak256.Ex developer
    • Pak256 wiki page
  • Languages: JP, EN, EO
Re: Server performance: preliminary findings
« Reply #76 on: January 27, 2020, 03:39:43 PM »
I tried TurfIt's implementation for my big map (it is the same map which I sent you --- Java Island). It is not so developed map, but it takes around 6-7 sec to save with zlib. After including zstd, it takes around 3 sec (without multi-threading!)

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 19273
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Server performance: preliminary findings
« Reply #77 on: February 01, 2020, 11:25:34 PM »
I have now attempted to apply Prissi's version of this using the patch. Unfortunately, the patch refused to apply automatically (I think that the base code is too different), so I have had to apply it manually; however, I believe that I must have made some error, as attempting to save a game results in a fatal error indicating that, on reading a saved game, the save file is corrupt - this error happens before any actual saving takes place (or, I think, just about the time that the version is saved).

For reference, the Github branch on which I have attempted to implement this is here.

I am not quite sure what features of Zstd called for some of the code changes that were in the patch, so I am unsure whether I have interpreted or applied the patch correctly in light of the differences in code between Extended and Standard.

Any assistance would be much appreciated.

Edit: I have also retried the wrapper version - I cannot get this to compile, as it fails at the linker stage, complaining of unresolved external symbols, despite pointing to the same library file as I was able to compile the native version with successfully. This is rather perplexing.

Edit 2: I have changed the zlib parameter from "wb" to "wb1". This seems to help a little, but it is still quite slow. I should be grateful if people could let me know with to-morrow's nightly build whether this is any better with the Bridgewater-Brunel server.
« Last Edit: February 01, 2020, 11:52:38 PM by jamespetts »

Offline prissi

  • Developer
  • Administrator
  • *
  • Posts: 9709
  • Languages: De,EN,JP
Re: Server performance: preliminary findings
« Reply #78 on: February 02, 2020, 12:44:33 PM »
During merging there will be coming standard's zstd support in -r 888x and following (where it is optinal). Because I was really disappointed, after more tests. (I have the exact data on the other computer.) It seems, that zstd is rather geared more towards highly repetative ASCII data (or many zeros/0xFF in data sets) and rather performs not so good on binary data.

(That was also the conclusion of the "squash" benchmark binary test, where firefox was the target; I think this is not so far from a simutrans game.)

zstd with a decent compression is as fast or slow as zlib, abeit files sizes are 10-25% smaller for zstd (for same compression time). While bzip is much slower, the files were almost 15-50% less size (and also less memory footprint). If memory is an issue, and the games are really large, bzips could still compress faster than the data is sent.

Also the MSVC libzstd need manual compiling, which may also put off people. Also, I have no clue to force multithread manual compiling on MSVC. (SO I benchmarked with Mingw64.)