News:

The Forum Rules and Guidelines
Our forum has Rules and Guidelines. Please, be kind and read them ;).

TCP window size?

Started by jamespetts, July 26, 2014, 05:44:04 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

jamespetts

I have been contacted by a user of Experimental in relation to download times: I am posting this here as this applies equally to Standard as it does to Experimental because the code is unchanged.

He tells me that he is having very slow download times of the large (circa 55Mb) map in the current Bridgewater-Brunel online game: much slower than other players at between two and a half and four and a half minutes. He says that he has a fast (30mbit) cable connexion. He mentioned that increasing the TCP window size might help, but I know nothing about this aspect of things, so I thought that I should raise it here in case anyone thinks it is worthwhile doing something about it. Apologies for taking people's time if this is not a useful discussion.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Ters

As far as I can see, TCP window size is a system option, not something an application like Simutrans should be messing with. It might even require administrative priviledges. Tuning it right also seems like secret magic.

If this is going to help at all, the problem must be that this person has a relatively high latency between himself and the server. In that case, it is the server that is holding back and needs to be "fixed".

jamespetts

Quote from: Ters on July 26, 2014, 07:36:52 PM
As far as I can see, TCP window size is a system option, not something an application like Simutrans should be messing with. It might even require administrative priviledges. Tuning it right also seems like secret magic.

If this is going to help at all, the problem must be that this person has a relatively high latency between himself and the server. In that case, it is the server that is holding back and needs to be "fixed".

Yes, I see, thank you. I do, of course, have administrator access to the server; but do you (or does anyone) have any idea how to tinker with the TCP window size?
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Sarlock

My download speeds are around 0.5-1 MB/s on average, taking about 1-2 minutes for the full download (sometimes it's slower and takes 3-4 minutes).  This is from the other side of the planet, in Vancouver, Canada, so it's quite acceptable in my view.
Current projects: Pak128 Trees, blender graphics

Ters

Another possible cause of slow speed is high packet drop rate along the way, either because of congestion on the line, or a bad cable. This might be beyond the control of either party. I can't, or at least don't know how to, diagnose the route between someone else.

It should be noted that from Windows 2000 onwards, Windows support TCP window scaling, which is made just for wide, but slow, connections. Other OSes has likely supported it for many years as well; the standard is from 1992. I couldn't quite get my head around when Windows actually begins scaling, but there seemed to be some trigger.

jamespetts

Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Ters

It should be noted that I found out most of this by googling, combining it with some basic knowledge of computer networks. I am not an expert on these things, but should be able to understand it.

DrSuperGood

I told him in game that chances are it is packet loss. He reported slower DL speed from the UK than the US (I assume this is corresponds to latency with US being faster than UK for him) which would either mean his system uses a far too small window size (if this happened DL would come to a crawl due to re-transmits for virtually every packet) or that there is packet loss (latency then corresponds to wasted bandwidth since common practice is to re-transmit packets from point of loss even if future packets were received correctly). The fact this is also a new problem points towards a sudden increase in packet loss rather than an underlying problem in the transmission procedure.

A custom UDP procedure simulating TCP with a larger window size could be written however this would be pointless if you could specify TCP window size in the first place. Is he really not overloading his network connection? What about unintentionally due to something he is not aware of? Too many unanswered questions...

Ters


DrSuperGood

I know all about TCP and window scaling already. Generally it is used for flow control with it being raised aggressively until packets are lost and then reset.

The differences in download rate with latency generally points towards a problem with packet loss since the TCP protocol itself should usually max out transfer rate with flow control as long as the physical window size (as that article explains) is large enough for the round trip time of the channel. If a lot of packets are being lost then not only is re-transmit a problem, but also flow control might lower bandwidth below what is capable since it may mistake un-related packet loss for traffic loss. There should be a return ICMP packet every time a IP packet is lost due to traffic but this is commonly ignored as itself would add to traffic so instead it become impossible to determine why a packet was lost from a network so I am guessing TCP flow control will assume the worst case and register all packet loss as a cause to back down.

There are many reasons packet loss can occur.
1. Packets sent faster than they can be transmitted between some nodes in the path. The result is eventually buffers fill up and overflow so packets have to be discarded. Good TCP should detect the increase in in-transit packets in the network and take action before buffers overload but few systems do this as it is completely incompatible with aggressive TCP since it will simply be yielding to the aggressive TCP all the time. This should not be a problem as TCP is designed to cope with it.
2. Physical errors causing packet corruption (checksum mismatch) resulting in discarding. Although backbone networks should transmit virtually every packet perfectly (the chance of a transmission error is close to impossible) a lot of local distribution networks such as ADSL are very prone to packet loss. Something stupid like excessive noise due to a defective device nearby can cause packet loss. The solution to this is often to improve signal to noise ratio of the local connection or increase error margins. Although TCP will resist this kind of loss, it will result in considerably lower bandwidth as re-transmits will occur, potentially of the entire window which occurred a packet loss.

As far as Simutrans is concerned it would be great if servers  could allow sessions to progress while clients download or at least such an option. Downloading can take several minutes in this case as the map is 50 MB during which time all clients are forced to wait. Worse is a bug in experimental where if you just issued an order before the join, you will instantly out of sync when the server resumes forcing you to rejoin which is another 50 MB.

Ters

Quote from: DrSuperGood on July 27, 2014, 11:30:33 PM
As far as Simutrans is concerned it would be great if servers  could allow sessions to progress while clients download or at least such an option. Downloading can take several minutes in this case as the map is 50 MB during which time all clients are forced to wait.

From what I understand of Simutrans' multiplayer support, which don't really care about, if the game continued while a client downloaded, that client would immediately be desynced and kicked out since what he downloaded is way out of date already. Ironically, this would likely be easier to fix if games took only a few seconds to download.

kierongreen

What should be possible is to continue everyone elses game while downloading happens and keep a buffer on the server of all actions received. Then once the new client has connected instruct all other clients to pause, pass this buffer on to the new client and put it into a fast forward mode that carries out the actions in the buffer until it's caught up with the rest of the clients. The potential problem with this is if you had a slow client that couldn't fast forward at a reasonable speed the delay might be just as long as the download.

Ters

Quote from: kierongreen on July 28, 2014, 08:05:52 AM
The potential problem with this is if you had a slow client that couldn't fast forward at a reasonable speed the delay might be just as long as the download.

Or even, although that might require a very slow client and a very action filled game, the buffer might never empty. And how big is this buffer allowed to get. Multiple buffers will also be needed if other clients connect while another is downloading, at least conceptually.

Not impossible, but a rather huge task compared to the development capabilities in the community. But then again, so was multiplayer in the first place, and double heights.

mad_genius

I've never played the multiplayer part of simutrans but, from what I understand the problem seems to be the fact that each client has to download the full map every single time it joins the game session, right?
In that case a solution might be possible by applying the principles of "version control" systems to the multiplayer maps.

- Each client would download the full map only on the first tyme it joins a specific game.
- After that the client would keep that copy of the map on it's harddisk and on subsequent joins would only download the differences between it's copy and the current state of the map.

The basic idea here is to reduce those 50mb downloads to much smaller incremental downloads.

Just my 2 cents on this matter.

kierongreen

The problem with incremental downloads is that you would have to store the state of the map at all times which is impractical.

Ters

There is the rsync algorithm, but compression screws it up.

DrSuperGood

Any number of actions could be buffered server side for a client and if the game was fully deterministic (it is meant to be, but in the case of experimental there are issues with that at the moment) the client could spend any amount of time downloading and then potentially keep trying forever to catch up. Obviously such open bounds are impossible and completely impractical so certain restrictions must be placed with policies to handle when they are exceeded. Both restrictions and policies could be configured by the server administrator.

Restrictions:
Maximum number of downloading clients (to limit maximum resource usage).
Minimum average native upload rate for client (over a few seconds or a minute, native as in not server limited but limited by client QoS/ISP, to help avoid clogging up finite resources for excessive periods with clients using impossibly slow connections or potentially maliciously limiting download rate to hog resources).
Maximum upload rate between all clients (to manage upload congestion from causing excessive packet loss which is bad for active clients quality of service).
Maximum command buffer size (in data volume, to limit maximum resource usage).
Maximum time to catch up (either based on an absolute second quantity or as a multiple of file size or client measured native download rate, to prevent clients from never catching up).
Minimum time between saves (in time derived from game frames/ticks, defines a minimum command buffer size that must be kept so that other joining clients will have all actions queued since the last save, this prevents multiple savings in a row if new clients join near instantly in short succession).

Policies:
When maximum number of downloading clients is exceeded either refuse them connection or add them to some kind of queue. Queue may have own limits but would be larger than downloading clients as less resources are needed per client in the queue than when downloading as no commands require buffering and no large data is being transferred.
When minimum average native download rate is failed then the client is disconnected with an error message along the lines of "Dropped due to the connection having insufficient bandwidth."
When maximum command buffer size is reached then take one of two actions. Either drop the client with an error message along the lines of "Dropped due to command buffer overflow." or pause all playing clients until the command buffer has at least a certain amount of free space with a message "Paused to allow clients to catch up.".
When maximum time to catch up is exceeded take one of three actions. The client could be dropped with an error message along the lines of "Dropped due to excessive catch up time.". All clients could be paused with a message "Paused to allow clients to catch up." until the client has caught up. Finally it would also work to slow down the game speed to at least some fraction of the maximum catch up speed the slowest catching up client is achieving giving all playing clients the message "Slowed to allow clients to catch up.".

It is important that if the client takes longer to join than the minimum save period then it may need to save and load multiple times to correspond to any save and loads that affected all playing clients during the catch up time as sadly I believe saving and loading results in a transformation of game state. I would technically class this as a bug as a proper save and load should result in an identical game state after loading as when saving (the idea of savestate from the console emulator scene best explains what a proper save should do but obviously not as memory intensive). Most noticeable in experimental when all route path finding and route times are updated on load so that a ship that was stuck "no route" until it gets another tick would suddenly come into action as soon as the game resumes after loading.

Ters

Quote from: DrSuperGood on July 29, 2014, 11:41:32 AM
Obviously such open bounds are impossible and completely impractical so certain restrictions must be placed with policies to handle when they are exceeded.

Certainly, but who is going to implement them?

prissi

Back to TCP IP sizes. At least in Cambridge and around lots of (older) house get their phones by lines on a pole which are then all fed together underground. Almost the worst possible screnario for ADSL. Consequently after 18h to midnight my download speed goes from >6 Mbit/s to < 3 Mbit/s. (Not to mention that with rows of terraced houses just 3-4 m wide, I see Wlans on all(!) avialable 20 channels ... Of course it make no sense.)

So it might be that around certain times of the evening there are just too many packet losses.

Ters

Quote from: prissi on July 29, 2014, 07:53:36 PM
Almost the worst possible screnario for ADSL.

I can beat that with a cable that had been fried by lightning. Once the wind started making the lines sway, there was 100% packet drop, likely down at the link level. I diagnosed it correctly at once, but had some serious trouble getting support to fix it, because they measured the line and it was fine (no wind, so of course the line was fine, we were even using it to call them). It took days of banging our heads against the support before some repairmen were finally sent out to look at the line in person. They came, had a look and replaced the lines thoughout the neighbourhood immediately. The thunderstorm I suspect the most that was the culprit was actually long before that, so it was a wonder our ADSL worked as well as it did for so long.

Currently, I have fiber. I hardly, if ever max it out. There seems to always be some weaker link in the chain. And still they try to have me upgrade my line to a faster one.