News:

Congratulations!
 You've won the News Item Lottery! Your prize? Reading this news item! :)

Desyncs when renaming (on the nightly server)

Started by prissi, December 16, 2012, 09:46:25 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

prissi

But only if the client runs windows, no matter if the GDI nightly or a self-compiled by MSVC. It does also not matter, what is renamed, either convoi, line, player, halt, factory, or town. Strange but reproducable.

The server runs a gcc 4.7.2 optimized build (since it is only a pentium 4) with two threads. Dwachs connection from Linux had not problems with renaming.

And now I want to hear suggestions ;)

EDIT: Thanks to TurfIt is seems that only rename commands cause an executed in the future desync, i.e. the step when it will be executed is somehow different than for other commands.

TurfIt

Hitting enter on gui_textinput field to rename almost always causes a desync (unless additional_client_frames_behind=4 is set). Closing the dialog, or changing focus away from textinput results in a successful rename.

I note two rename commands are sent to the server upon hitting enter. Once received back by the client, it executes the first ok, the second seems to get stuck in the queue and finally tries executing in the past.. desync.


Index: gui/components/gui_textinput.cc
===================================================================
--- gui/components/gui_textinput.cc (revision 6197)
+++ gui/components/gui_textinput.cc (working copy)
@@ -84,6 +84,7 @@
// handled by container
case SIM_KEY_ENTER:
call_listeners((long)1);
+ return true;
case SIM_KEY_TAB:
// Knightly : focus is going to be lost -> reset cursor positions to select the whole text by default
head_cursor_pos = len;

Fixes the double send problem... and hence the disco. Don't know the side effects of changing this though...

Dwachs

#2
Nice!

Why does this have any effect? Can you explain? As I see it, the return value is ignored. Edit: found it: gui_container sent UNTOP-msg etc.

Why does the second sent command get stuck in the queue?
Parsley, sage, rosemary, and maggikraut.

prissi

#3
Since it does happen only under Windows, no matter what compiler I highly suspect the TCP/IP stack itself is delaying the second package a little. (Or do you sent additional_client_Frame!=0)

I can test under Haiku tomorrow.

EDIT:
Not desyncing when connected to the same switch. So long WAN transmission is required to trigger this.

Dwachs

Is the fix in svn already? I suspect the above patch will mean that after pressing 'enter' the text-input does not lose focus.

As to additional_client_Frame: I am not aware that I changed this from the default value. Will check tonight.
Parsley, sage, rosemary, and maggikraut.

prissi

#5
Enter not changing focus is not too helpful. Because it would be very hard using key command for instance when the chat window is open.

EDIT: Desync now renaming a stop even though my switch is connected to the same switch as the server. Maybe packet sending is deferred by windows, or packets are not ready too long on the client.

EDIT: PAtch2 which will keep the previous behaviour and save many useless messages anyhow.

Dwachs

Here is my approach. I was about to post the patch, when I saw yours. -> We had the same idea.

I would set dirty in gui_textinput_t::remove_selection, too, and in gui_textinput_t::set_text to keep current behavior.

Please commit :)
Parsley, sage, rosemary, and maggikraut.

TurfIt

Prissis' patch2 seems ok. Solves the double command without side effects that I've seen.

However, there are places where double (triple or more) commands are valid. Click on a signal rapidly to change its facing and you can trigger this exact same execute in past desync.

@prissi: is that a managed switch you could connect a packet sniffer onto?
That could determine who's at fault - client or server...  I don't think it's the client delaying sending - and shouldn't matter if it did. The sent packets don't contain the desired execute time. Either the server is adding the execute time and then delaying sending, or the client is delaying processing of received packets.

---
WTF?

void network_set_socket_nodelay( SOCKET sock )
{
#if defined(TCP_NODELAY)  &&  !defined(__APPLE__)
// do not wait to join small (command) packets when sending (may cause 200ms delay!)
int b = 1;
setsockopt( sock, IPPROTO_TCP, TCP_NODELAY, (const char*)&b, sizeof(b) );
#else
(void)sock;
#endif
}


TCP_NODELAY is critical to this working at all - how can it be ignored like this (else)? Further is it not defined in unix systems in netinet/tcp.h? which is not included anywhere???  I'm going to assume prissis linux server is not setting this and hence delaying the sending...

jamespetts

Is this a problem that has been around for a while, or is this new on the nightlies? I know that some people have reported it, or, at least, something like it, in respect of Experimental in the past, and I was wondering whether I ought to incorporate this fix into the current stable branch of Experimental or whether it will only be needed when I merge up to 112.0.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

prissi

#9
The server was indeed compiled without it. The switch is a little box without any diagnostics apart from flickering LEDs. The documentation on it read that it should not be set but for very special circumstances. MAybe running a server is one ;)

But the way the power to the server might be cutted on the 21st.

COmmitted my patch with Dawchs addition of setting text dirty in remove_selection()


TurfIt

To prevent another silent reversion with TCP_NODELAY, I've added a #warning. Also forced its use when compiled with COLOUR_DEPTH=0 for dedicated server; It's absolutely required for any server...

@jamespetts - both issues have been around a while. I see you've merged the txtinput fix, but missed the tcp.h one. Fine as long as your server isn't running Linux...


jamespetts

Hmm - my server does run Linux. The only relevant change that has made it to Github so far is adding the following line to network.h:


# include <netinet/typ.h>


I assume that there is another change that you pushed to the SVN recently? What is the effect of not including this for a Linux server?
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

TurfIt

#12
if tcp.h is not included - TCP_NODELAY is not set which causes the server to delay sending small packets (commands) which cause clients to lose sync (execute in past) - search tcp nagle.

Also, r6201 causes me crashes as soon as a dialog containing a txtinput is displayed. Tracking it down...EDIT: bad header dependencies... clean build ok.

Ters

It almost sounds like you shouldn't be using TCP at all.

Dwachs

Any suggestions, which protocol to use instead of TCP ?

TCP at least guarantees that all data packets arrive at the client (and are still in the same order as sent by server). These properties are essential for our implementation.


@prissi: why not include netinet/tcp.h ? Is there a type in network.h? typ instead of tcp ???
Parsley, sage, rosemary, and maggikraut.

Ters

The lack of suitable alternatives is the reason for "almost". TCP "guarantees" arrival in order, but says nothing about when. Fast paced games apparently use UDP (or at least they used to), because if a message is late, it doesn't matter anyway. (For messages that are important, I've heard that some use an additional TCP socket, while others build a simple layer on top of UDP.) Simutrans seems to strive for a middle ground I don't know if exists.