News:

Simutrans.com Portal
Our Simutrans site. You can find everything about Simutrans from here.

rsync for transmitting game data in online games

Started by jamespetts, November 30, 2012, 01:20:48 AM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

jamespetts

One issue that arises when playing online games, especially those with large and/or well developed maps, is the amount of time that it takes people to connect. Much of this time is taken in downloading the saved game, which can be hundreds of megabytes in a larger map.

The saved game is transmitted in full because that is the only way of ensuring that each player has exactly the same game state when everyone is unpaused, to ensure that the server remains synchronised. However, has anyone thought of using rsync to reduce this download time? A player could pre-download a recent saved game from the server, then, at the point of connexion, the game could use rsync to download the fully up to date saved game, using the earlier saved game as a base. This would mean that only the differences between a game saved whenever somebody last joined or the server was last force synced with nettool and the current state would actually need to be transmitted, saving everybody time, and saving a great deal of bandwidth for client and server.

Would this be feasible, do people think?
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

TurfIt

With the default bz2 compressed saves - no chance. The files become >99% different even saving a few seconds apart.
Working off the uncompressed binary saves - maybe. If you have a sequence of saves from your server, you could decompress the files and run a binary diff on them. See how quickly the diff grows as the game progresses. That would answer if it's worthwhile pursuing...



jamespetts

Hmm, I hand't appreciated the randomising effect of compression. I am keen to run the test that you suggest - but do you know which Windows binary .diff tools will tell me the size of the difference...?
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Fabio

What about saving server side in xml and bzipping the diff? Text however big, if compressed, should shrink a good deal..

jamespetts

I think that previous testing with XML saves showed them to be enormous and unreliable, sadly. Using a binary diff would be more economical.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

sdog

Fabio: rsync already allows compression for the data it transfers.


Perhaps a sollution that would go deeper into the save file structure. Identify parts that are very static (eg landscape, tree, factory and non-player building locations) and those that are very dynamic. Generate differences of the mostly static parts and save those too. The final save file would be a container with three seperately compressed parts.

At a manual, player initiated, save the static and dynamic data is rewritten. For automatic saves at leaving and joining of players the dynamic part and the diff are transfered only.

Now, regardless of the immense work it would require, is such a thing even feasible at all? Just brainstorming here :-)

prissi

Even compressing XML saves (the only format work zipping) then those are 2x the bz binary size.

For developed games, the map contributes less and less. Some early servergames started with 175 kB and ended with more than 3 MB. Most of the data are made up by dynamic stuff like goods packets or convois and city carswhich are very highly dynamical. I honestly do not see how to make useful diffs there.

jamespetts

I am having real trouble finding a Windows binary .diff tool that can open such large files and give me a useful output as to the size of the .diff between them. Any suggestions would be much appreciated. It would be very useful to know for sure whether the compression method is better or worse than the rsync method.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

sdog

James look at delta compression for that. You don't need the actual diff for your comparison, just the size of the output of the delta compression.

http://en.wikipedia.org/wiki/Xdelta

(you might have noticed the stdout when you push some commits with git, which uses something like that)

Ashley

Would be interesting to see some stats on how long it takes to transfer the game verses how long it takes to actually load the map. Anecdotally it seems to take many times longer to load it than the actual network transfer...
Use Firefox? Interested in IPv6? Try SixOrNot the IPv6 status indicator for Firefox.
Why not try playing Simutrans online? See the Game Servers board for details.

jamespetts

Sdog - thank you - will look into that!

Timothy - the ratio might very well depend on the size of the map, the connexion speed and the speed of the client computer. Experimental maps might also be larger than Standard maps of the same dimensions and level of development.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

prissi

The loading of the map depends on two factors: How fast can it transfer/decompress and how fast can new objects be built. The latter is clearly the limit for everything which is beyond 1 MBit/s (i.e. basic DSL). Only people with modem would profit from that.

I see also a second problem with diffs: The reason for reconnection is, that the local map is wrong. I order the have a base for the diff, the server would need to know when the client was connected last and keep that save too. For restoring the game at client side there will be another step: Unpacking old save, new diff, construct new save, and finally load the game. For the intermediate step two maps must be hold in memory, which easily amount to more than 1 GB. That might also slow down systems.

Ters

The rsync algorithm doesn't need a base, or in a sense the client is the base. Judging from the Wikipedia article, it seems to require that the same data is in the same place in order to be really efficient, which sounds like a problem for anything but the terrain. The Wikipedia article might be simplified or inaccurate though.

TurfIt

#13
Just did some testing with the new_york_city_1_0.sve and new_york_city_1_v15.sve. That's the only largish well populated map I have with a base and developed save.

The files are 4.44MB and 7.44MB when compressed with bzip2.  It takes 38 seconds + 20MB RAM for _1_v15.
Xdelta: gives at best a 12.6MB diff! Same as saving _1_v15 in gzip - window size too small. Better to just use bz2.
bsdiff: 8.3MB diff. still a bust. and it took 100 seconds + 320MB RAM.
lzma diff: 6.7MB.  744kB savings but 47 seconds + 708MB.

The time and memory requirements grow exponentially as file size increases too. I routinely throw 16GB at lzma...




As for the loading times, having just worked on dual threading that, I have quite a few numbers from that.
Transfer is the limit, not decompress / how fast objects can be built upto 500MBit/s connections... far beyond what I can get.
I can load binary saves  at 524Mbit/s, gzip 524Mbit/s (dual threading), and bz2  400Mbit/s. object creation limiting
Saving is binary 765 ( =95MB/s which is speed of hard drive!  ), gzip 322, bz2 40. IO/CPU/compress limiting
Resultant files are binary 185MB, gz 19MB, bz2 9.6MB.
Crunching the numbers, it's best to save in bz2 for internet speeds < 2.21MBit/s, and in gzip for speeds 2.21 - 20.73Mbit/s, and binary for > 20.73Mbit/s

The current joining process has the server saving the complete file, then sending it. If this was streamed instead from the compression buffers, the total time could be cut in half for bz2. It takes 37 seconds to save bz2, and 35 seconds to send the resultant file @2.2Mbit/s.
But a mere 5Mbit/s connection sending the gz instead would beat it. Hardly worth implementing streaming.

I think for normal speed internet connections, it's best to make sure the servers are set to zipped and not bzip2...






prissi

By gzip you mean standard Z algorithm? (That is what simutrans actually uses with zip.)

By your comments are sure interesting, especially when the server is weak (compared to the clients).

TurfIt

yes, the simutrans zip saveformat is what I meant.
Those results are from my I7, I also have results from my old Core2 machine.  1.3-1.6x factor between them for this task (sync_step is 6x faster for reference), not all that great a difference. Given it's only dual threaded, most of the difference is down to simply the clock rate difference. ( and a slightly better IPC for the Ivy over the old Conroe)

I'd thought using bz2 for netsaves was best to minimize the transfer, but until I ran these numbers I didn't appreciate how slow bz2 saving is vs even quite slow internet connections (I'm stuck at 6Mbit/s in this backwater country).

Fabio

Are there any compress algorithm which could allow streaming so that it can start decompressing while the first packets arrived? Shouldn't this approach cut decompressing overhead?

prissi

Sure, we could use huffman codes, which, when optimize for Simutrans typical sve datas might do quite well are can be directly streamed.

TurfIt

Quote from: Fabio on December 03, 2012, 10:36:28 PM
Are there any compress algorithm which could allow streaming so that it can start decompressing while the first packets arrived? Shouldn't this approach cut decompressing overhead?
Of course. That's pretty well how every compression algorithm works. Internally Simutrans streams the savefile to/from the compression libraries. What I was getting at above is the file transfer in multiplayer doesn't start until after the complete compressed file has been written to disk. Similarly the client doesn't start the decompression until the complete file has been received. That could be changed, but the time saving for all but the bz2 saveformat is minimal.

On the client side, if the client is having trouble decompressing/loading the save, then it's rather likely that client is too slow to stay synced anyways (should be booted rather than killing the experience for the rest of the players with their continual reconnects IMHO).

For the server, best to just stay away from bz2. As long as there's more than 2MBit/s transfer possible the total time is less with gzip.

jamespetts

Very interesting numbers and work, TurfIt!

I am interested - would your advice remain the same with  strong servers and a mix of weak and strong clients with a mix of fast and slow internet connexions, being to use zip not bzip2?
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Ashley

It does indeed look like switching away from bz2 would be good for servers. I'll have to make this change and see if it improves things. It's nice to see some concrete testing :)
Use Firefox? Interested in IPv6? Try SixOrNot the IPv6 status indicator for Firefox.
Why not try playing Simutrans online? See the Game Servers board for details.

TurfIt

Depends on what you mean by a 'strong' server. Get me a quad E5-4650 w/ 48GB+ and we'll talk  ;)

For Simutrans saves, you can expect bz2 to be ~8x slower while producing a file ~2x smaller than gzip. The 2MBit/s cutoff above was calculated from the time it takes for a i7-3770K to do a save/load. It would take quite a slow internet connection and a rather faster server to make bz2 worthwhile. Also, client strength is largely irrelevant. Compression takes mucho time, decompression not so much. If decomp is significant, the client's too slow to stay synced anyways.

kierongreen

Have you taken into account the number of players when crunching numbers?