News:

SimuTranslator
Make Simutrans speak your language.

[r8423] Simutrans freezes when "Play Online" button is clicked

Started by THLeaderH, April 30, 2018, 11:39:36 AM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

THLeaderH

After launching simutrans, I clicked "Play Online" button to join a network game and simutrans did not respond anymore. No error message can be seen either on GUI screen or console, even with gdb debug tool.
I tested on pak128 and pak.nippon. This freeze was reproduced on the both pakset, so this problem seems to be pakset independent.
This bug was found in simutrans nightly r8423 on macOS.

prissi

Did it freeze for more than 60s or did it unfreeze after some time? (Because without listserver running, the timeout may take a while). Listserver is running again, though.

Yona-TYT

This has always given me headaches, a lot of time waiting and nothing to do to cancel the wait, I would like a more elegant solution for this.  :-[

THLeaderH

Quote from: prissi on April 30, 2018, 12:15:57 PM
Did it freeze for more than 60s or did it unfreeze after some time? (Because without listserver running, the timeout may take a while). Listserver is running again, though.
On my test, it takes about 10 seconds on average to unfreeze. I misunderstood that simutrans freezes permanently.

DrSuperGood

Simutrans will freeze until the TCP connection attempt times out. This varies from 10 to 30 seconds depending on the network stack.

Unfortunatly there is no programming solution to this. As far as I am aware Berkly sockets API does not support asynchronous connections, which would allow connection attempts to not block the main thread. Additionally multi thread support is still optional in Simutrans standard so the game must be designed around supporting only 1 single thread, the main thread, so one cannot hack together an asynchronous connections API using worker threads. On top of this much of the network code relies on global state, which is not really thread safe or potentially limits the number of parallel network based operations that can occur at a given time.

Ters

Non-blocking sockets is a possibility, although Simutrans is in general not designed for asynchronous operations. The multi-threading that has been done is fork-join-style, which is something different from this.

Dwachs

The current situation can be improved, e.g., the waiting for a server to respond can be done in a separate thread, without blocking the gui.
Parsley, sage, rosemary, and maggikraut.

DrSuperGood

QuoteThe current situation can be improved, e.g., the waiting for a server to respond can be done in a separate thread, without blocking the gui.
Not without creating an entirely different code path for those builds without pthread, turning the already nightmarish network code even more so. This is why I keep saying pthread should be made a requirement rather than an option.

THLeaderH

I'm surprised to hear that pthread is an option, not a requirement! Why don't we make pthread a required component?

Ters

Quote from: Dwachs on May 01, 2018, 09:16:19 AM
The current situation can be improved, e.g., the waiting for a server to respond can be done in a separate thread, without blocking the gui.
The problem is not easily solved by just slapping on a thread. The main part of the problem is that you suddenly have something else to tidy up. If the user clicks connect, and the program asynchronously starts a connect operation, one suddenly has to deal with the problem that the used may close the dialog before the connect succeeds and that the user may repeatedly click connect while a connect is underway. Blocking the event thread is an easy, but also very inelegant, way to solve the former, and maybe also the latter, problem.

With a background thread, the background thread has to wait for one of two things: connection completion (successful or not) and cancellation message from event thread. I know a single function for this in Win32, but I don't know enough about pthreads to know if it can do the same. The server will also likely experience more spurious connects, caused by users clicking cancel after the server has acknowledged the connect, but before the client received confirmation. It should be hardened for such things already for other reasons, but I don't know.

The said, I also do not know how easy it is for the main event thread to periodically poll the status of a non-blocking connect, but it is a possibility.

Quote from: DrSuperGood on May 01, 2018, 01:59:38 PM
Not without creating an entirely different code path for those builds without pthread, turning the already nightmarish network code even more so. This is why I keep saying pthread should be made a requirement rather than an option.
With mingw64 configured with the posix thread model (as is the case with MSYS2), the standard C++ library forces a dependency on pthreads already. (I've posted a patch to force this dependency to be statically linked, since almost everything else is so by default.) My complaints over threading has not so much been over a dependency on pthreads, as using threads in the most difficult way possible (concurrent operation on shared state), without any benefits over the old tried-and-tested single-threaded code. In this case, the threading would hopefully be in a part of the code I don't use anyway.

prissi

Threading is not default because 1) the server does not need it (and may even run o a single virtual core) and 2) some things are hard to debug in multithreading. So it is nice to be able to disable threading.

The problem with a crashes list server is more subtle. There server is there, it just does not answer. Even google chrome waits for 30s before giving an error. So simutrans is in good company there. One could of course show a waiting bar with a 30s countdown. It that reaches zero, then one could kill the nonblocking thread waiting for the server and give up.

Yona-TYT

Quote from: prissi on May 06, 2018, 03:01:27 PMEven google chrome waits for 30s before giving an error. So simutrans is in good company there. One could of course show a waiting bar with a 30s countdown. It that reaches zero, then one could kill the nonblocking thread waiting for the server and give up.

I like that idea, cancel the frustrating wait, to continue and enter a server manually.  8)

DrSuperGood

QuoteThe problem with a crashes list server is more subtle. There server is there, it just does not answer. Even google chrome waits for 30s before giving an error. So simutrans is in good company there. One could of course show a waiting bar with a 30s countdown. It that reaches zero, then one could kill the nonblocking thread waiting for the server and give up.
Not entirely sure what you mean. A thread cannot both be non-blocking and waiting for a server. One also should avoid killing threads, especially if resource management if involved.
QuoteThreading is not default because 1) the server does not need it (and may even run o a single virtual core)
The server would benefit from multi threading because it has to communicate through a network. Although there is some buffering with network calls to minimize it, there is also a chance that network transmission calls blocks for some length of time which results in the server wasting available computational time for the game. This might be the case for a very laggy client that retransmission causes just the send buffer to them to overflow.

This would also open up the ability to accelerate multiplayer joining time because one could transfer saves in parallel to advancing the game state allowing people to join and catch up, like OpenTTD, rather than currently where everyone must remain paused while the other client joins.

Ters

Quote from: DrSuperGood on May 06, 2018, 04:56:58 PMNot entirely sure what you mean. A thread cannot both be non-blocking and waiting for a server. One also should avoid killing threads, especially if resource management if involved.
I was wondering about the same things, but for the first part, I assume he meant that the thread is using select or poll. As for killing threads, yes that is bad, but pthread does not appear to have an way of doing it. (Java is also removing its "support" for doing so, after having it marked as deprecated for probably twenty years or even more.) The only sad thing is that I can't seem to find any way of waiting on a socket and an event at the same time in a platform independent way (or rather, for anything but Windows).

Quote from: DrSuperGood on May 06, 2018, 04:56:58 PMThe server would benefit from multi threading because it has to communicate through a network. Although there is some buffering with network calls to minimize it, there is also a chance that network transmission calls blocks for some length of time which results in the server wasting available computational time for the game. This might be the case for a very laggy client that retransmission causes just the send buffer to them to overflow.

This would also open up the ability to accelerate multiplayer joining time because one could transfer saves in parallel to advancing the game state allowing people to join and catch up, like OpenTTD, rather than currently where everyone must remain paused while the other client joins.
Non-blocking sockets would perhaps be enough for the first case. Using threads to do blocking I/O is as far as I understand it a bit old fashioned. If the non-blocking call fails because the buffer is full during playing, the connection is perhaps too slow to continue anyway. Non-blocking sockets and a single background thread is perhaps the modern way of doing bulk transfers.

prissi

The server does not benefit from threading, since it has to transfer the games state at load. At that time the game is frozen for all clients. Hence rather the game is transferred as quickly as possible. SO no benefit from threading. The alive messages to the list server should fit in one packet, so also there I see no advantage of threading. (Especially since a server with slow upload is useless).

ACarlotti

Quote from: prissi on May 07, 2018, 03:13:32 PMThe server does not benefit from threading, since it has to transfer the games state at load.

I think Dr Supergood was suggesting an improvement whereby the savegame is transferred to the new client while it continues to run on the existing clients; once the new client has downloaded the save it would effectively run the game on fast forward until it catches up.

I think an idealised timeline would look something like the following:

ServerExisting ClientsNew Client
Requests to join game
Saves gameSaves game (at same point)
Loads from save, and begins sending it to new clientLoads from saveBegins receiving save
Resume game, not yet synchronised with new clientResume game
Finishes receiving save, and loads the game
Runs the game as fast as possible (without graphics?) to catch up
Now running synchronously with new clientCatches up with server, and begins running normally

This might lead to a significant reduction in waiting time for existing clients, at the expense of a longer joining time for new clients, and much increased code complexity. Having not yet played any network games myself (ignoring some ultimately failed attempts about 7 years ago), I have no idea if this would actually be a useful thing to have.

Yona-TYT

@prissi
How sad that this is not solvent before the next stable version.  ::'(



Starting program: /home/yonatan/Escritorio/simutrans/sim
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.27-8.fc28.i686
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/libthread_db.so.1".
Use work dir /home/yonatan/Escritorio/simutrans/
Reading low level config data ...
parse_simuconf() at config/simuconf.tab: Reading simuconf.tab successful!
warning: Error reading shared library list entry at 0xfffffc20
warning: Error reading shared library list entry at 0xffffff50
warning: Error reading shared library list entry at 0xffff9600
warning: Error reading shared library list entry at 0xffff8d80
warning: Error reading shared library list entry at 0xffffa710
warning: Error reading shared library list entry at 0xffffa990
warning: Error reading shared library list entry at 0xffffac10
warning: Error reading shared library list entry at 0x80
Preparing display ...
SDL_driver=x11, hw_available=0, video_mem=0, blit_sw=0, bpp=32, bytes=4
Screen Flags: requested=10, actual=10
dr_os_open(SDL): SDL realized screen size width=704, height=560 (requested w=704, h=560)
Loading font 'font/prop.fnt'
font/prop.fnt successfully loaded as old format prop font!
Init done.
parse_colours() at config/simuconf.tab: parse_simuconf() at pak/config/simuconf.tab:
Reading simuconf.tab successful!
warning: Error reading shared library list entry at 0xffffacb0
[New Thread 0xb2c50b40 (LWP 3313)]
[Thread 0xb2c50b40 (LWP 3313) exited]
[New Thread 0xb2c50b40 (LWP 3314)]
Reading compatibility sound data ...
Loading BDF font 'cyr.bdf'
Reading city configuration ...
Reading speedbonus configuration ...
Reading menu configuration ...
Reading object data from pak/...
Reading menu configuration ...
Loading BDF font 'cyr.bdf'
Loading BDF font 'cyr.bdf'
Loading BDF font 'cyr.bdf'
Loading BDF font 'cyr.bdf'
Loading BDF font 'cyr.bdf'
Loading BDF font 'cyr.bdf'
Loading BDF font 'cyr.bdf'
Loading BDF font 'cyr.bdf'
Loading BDF font 'cyr.bdf'
Loading BDF font 'cyr.bdf'
Loading BDF font 'cyr.bdf'
Loading BDF font 'cyr.bdf'
Loading BDF font 'm+10r.bdf'
Loading BDF font 'cyr.bdf'
Loading BDF font 'cyr.bdf'
Error: Cannot open 'wenquanyi_9pt.bdf'
Loading BDF font 'cyr.bdf'
Loading BDF font 'Prop-Latin1.bdf'
Loading BDF font 'cyr.bdf'
Loading BDF font 'cyr.bdf'
Loading BDF font 'cyr.bdf'
Loading BDF font 'cyr.bdf'
Loading BDF font 'Prop-Latin1.bdf'
Loading BDF font 'cyr.bdf'
Loading BDF font 'cyr.bdf'
Loading BDF font 'cyr.bdf'
Error: Cannot open 'wenquanyi_9pt.bdf'
Loading BDF font 'cyr.bdf'
Loading BDF font 'cyr.bdf'
Loading BDF font 'cyr.bdf'
Loading BDF font 'cyr.bdf'
Loading BDF font 'cyr.bdf'
Loading BDF font 'cyr.bdf'
Loading BDF font 'cyr.bdf'
Loading BDF font 'cyr.bdf'
Loading BDF font 'cyr.bdf'
Loading BDF font 'cyr.bdf'
Midi disabled ...
Calculating textures ...done
Creating cities ...
Creating cities: 1
Creating factories ...
Distributing 1 tourist attractions ...
Preparing startup ...
Loading BDF font 'cyr.bdf'
Show banner ...
warning: Error reading shared library list entry at 0x1f90
warning: Error reading shared library list entry at 0x2300
warning: Error reading shared library list entry at 0x26c0
Running world, pause=0, fast forward=0 ...


There it stays frozen, and I must force the window to close.  :-[


Ters

Quote from: Yona-TYT on May 14, 2018, 06:19:20 PM
There it stays frozen, and I must force the window to close.  :-[ 

That does not sound quite like this reported issue, in which case you would just have to wait a little while.

Perpetual hangs does rather sound like an unstable network connection (if it is indeed network related), in which the connection has been established, some data has been sent successfully, but then a reply from the other end goes missing. With sockets, a receive operation will by default not time out, because the lack of incoming data might simply be that there isn't any. The transport layer has no way of knowing if that is to be excepted or not, unless explicitly told how it is with setsockopt. The sending side however will notice something is wrong, because it doesn't get acks, and likely terminate the connection at its end, but any termination signal might get lost as well. We had a lot of trouble with this at work when the network infrastructure started dropping lots of packages after a certain amount of uptime/use, until some equipment finally was replaced.