News:

Simutrans.com Portal
Our Simutrans site. You can find everything about Simutrans from here.

network_init_server() - Unable to bind socket to IP address

Started by Michael 'Cruzer', August 28, 2014, 04:29:49 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Michael 'Cruzer'

For help with this error or to file a bug report please see the Simutrans forum at
http://forum.simutrans.com
FATAL ERROR: network_init_server() - Unable to bind socket to IP address: "0.0.0.0"
Aborting program execution...


I often get this error message when stopping a server and then starting it again soon. Is there anything I can do to prevent this? (Since it blocks the start. When retrying it after ~120 seconds everything works fine again.)

I am using kill $pid (while $pid is the var containing the correct process ID of course), is this the correct way to stop the server via script? (Server is based on a Debian 7 minimal system.)
Founder and Ex-Maintainer of pak192.comic. Provider of Simutrans Hosting rental service.

Ters

Simutrans has no support for SIGTERM that I can find. (The only proper way to shut it down that I know, is through the GUI, but then I have never done any multiplayer stuff. Maybe the nettool has a way.) kill will therefore pull the rug on the process. It is possible that the resources held by the process, but never properly released, will be held in limbo for a while, although I've never seen such behaviour. Another possibility is that Simutrans does end somewhat gracefully, and that takes some time. Or that kill sends SIGTERM, and if the process doesn't take notice, it will after a grace period, move on to more drastic measures.

Philip

Quote from: Ters on August 28, 2014, 04:48:30 PM
Simutrans has no support for SIGTERM that I can find. (The only proper way to shut it down that I know, is through the GUI, but then I have never done any multiplayer stuff. Maybe the nettool has a way.) kill will therefore pull the rug on the process. It is possible that the resources held by the process, but never properly released, will be held in limbo for a while, although I've never seen such behaviour. Another possibility is that Simutrans does end somewhat gracefully, and that takes some time. Or that kill sends SIGTERM, and if the process doesn't take notice, it will after a grace period, move on to more drastic measures.

It's supported via SDL, if I'm reading the code correctly. The signal turns into SYSTEM_QUIT, which then sets env_t::quit_simutrans.

I suspect we could call network_core_shutdown a bit earlier after receiving the signal, though there will always be some delay while the signal works its way through the event queue.

It should unbind within a few seconds, though, not anything near the 120 seconds Michael hinted at. Something is seriously wrong if it takes that long.

Michael 'Cruzer'

QuoteThe only proper way to shut it down that I know, is through the GUI

That's a pity, since there is no GUI in posix build.

QuoteOr that kill sends SIGTERM

Due to the kill documentation (it's man page) it should only send one signal which is per default SIGTERM. You can force kill to send any signal by passing signal number as a parameter. I can give it a try for sending SIGKILL, but when there is no handler implemented in an application SIGKILL and SIGTERM should do the same, due to my knowledge.

QuoteIt should unbind within a few seconds, though, not anything near the 120 seconds Michael hinted at. Something is seriously wrong if it takes that long.

It seems to be just some seconds in most cases. But sometimes it tooks very long (which blocks my reboot as described). I am not sure, but it may have something to do if there has been an active connection to the server while SIGKTERM is triggered.
Founder and Ex-Maintainer of pak192.comic. Provider of Simutrans Hosting rental service.

Philip

Quote from: Michael 'Cruzer' on August 28, 2014, 05:00:18 PM
That's a pity, since there is no GUI in posix build.

I think termination via SIGTERM is an intended feature, and the right way to terminate a server. Sending a SIGKILL instead of SIGTERM will immediately terminate the server process, without doing any cleanup or saving anything. It's a bit like taking the battery out of your device, which is not a good way to shut things down.

Quote from: Michael 'Cruzer' on August 28, 2014, 05:00:18 PM
Due to the kill documentation (it's man page) it should only send one signal which is per default SIGTERM. You can force kill to send any signal by passing signal number as a parameter. I can give it a try for sending SIGKILL, but when there is no handler implemented in an application SIGKILL and SIGTERM should do the same, due to my knowledge.

Again, we do have a SIGTERM handler, courtesy of SDL, or kill wouldn't work at all.

Quote from: Michael 'Cruzer' on August 28, 2014, 05:00:18 PM
It seems to be just some seconds in most cases. But sometimes it tooks very long (which blocks my reboot as described). I am not sure, but it may have something to do if there has been an active connection to the server while SIGKTERM is triggered.

That sounds like it could do with some investigation. It's possible we wait for inactive clients to time out before unbinding our server socket, which we shouldn't do.

Michael 'Cruzer'

Had a look at simsys_s.cc and simsys_posix.cc and yes it seems like there isn't any handler. But it also looks like a graceful shutdown can be done like

void GetEvents() // and also GetEventsNoWait()
{
    if (sigterm_received) {
         sys_event.type = SIM_SYSTEM;
         sys_event.code = SYSTEM_QUIT;
    }
}


something like:

void posix_sigterm(int signum)
{
    printf("Received SIGTERM, exiting...\n");
    sigterm_received = 1;
}


and

// inside main()
struct sigaction action;
memset(&action, 0, sizeof(struct sigaction));
action.sa_handler = posix_sigterm;
sigaction(SIGTERM, &action, NULL);


but I don't have much knowledge about how Simutrans code works internally. That's just what I see would be the equivalent of SDL implementation. But I'll give it a try later.
Founder and Ex-Maintainer of pak192.comic. Provider of Simutrans Hosting rental service.

Ters

Quote from: Philip on August 28, 2014, 04:57:58 PM
It's supported via SDL, if I'm reading the code correctly.

Quote from: Philip on August 28, 2014, 05:08:04 PM
Again, we do have a SIGTERM handler, courtesy of SDL, or kill wouldn't work at all.

But SDL is not part of the game here. One doesn't normally use kill to terminate GUI programs.

Quote from: Michael 'Cruzer' on August 28, 2014, 05:15:03 PM
Had a look at simsys_s.cc and simsys_posix.cc and yes it seems like there isn't any handler. But it also looks like a graceful shutdown can be done like

void GetEvents() // and also GetEventsNoWait()
{
    if (sigterm_received) {
         sys_event.type = SIM_SYSTEM;
         sys_event.code = SYSTEM_QUIT;
    }
}


something like:

void posix_sigterm(int signum)
{
    printf("Received SIGTERM, exiting...\n");
    sigterm_received = 1;
}


and

// inside main()
struct sigaction action;
memset(&action, 0, sizeof(struct sigaction));
action.sa_handler = posix_sigterm;
sigaction(SIGTERM, &action, NULL);


I was thinking along the same lines. One might have to do some #ifdef-ing with alternate code for Windows, because I think it has it's own way of doing shutdown handlers for console programs.

TurfIt

Quote from: Michael 'Cruzer' on August 28, 2014, 04:29:49 PM
Is there anything I can do to prevent this? (Since it blocks the start. When retrying it after ~120 seconds everything works fine again.)
No. The OS is waiting for any sockets in the TIME_WAIT state to transition to fully CLOSED. Until they're all closed, an application can't rebind; 120s is a typically timeout for this to occur.
(Yes there are socket options to force the bind, but 2 mins is not a huge wait for things to be properly cleaned up IMHO).


Quote from: Michael 'Cruzer' on August 28, 2014, 04:29:49 PM
I am using kill $pid (while $pid is the var containing the correct process ID of course), is this the correct way to stop the server via script? (Server is based on a Debian 7 minimal system.)
'nettool shutdown'

DrSuperGood

In a graceful shutdown situation you should send the process a signal (via a command line driver or something?) which then makes the process run through shutdown procedures which include closing of any communication sockets. Forceful process termination or other strange external shutdown procedures will always produce buggy results such as leaking live sockets (which are eventually closed).

If running a command line is too heavy weight I would advise some form of open ended pipe allowing you to send signals to the server process from a separate command line tool (which you run only when required) that gets the server to shut down gracefully.

If the server has frozen, crashed or otherwise become excessively unresponsive then forceful shutdown is the only really safe way. All open sockets and OS objects should eventually get cleaned up but this may take a while. For an automatic restart script I would recommend trying to start and if the sockets are not available then waiting 120 seconds before trying again (and repeat in a loop).

Michael 'Cruzer'

#9
Got it work like I described it above. :D

QuoteOne might have to do some #ifdef-ing with alternate code for Windows

You are right. Only tested it on Mac and Linux. Don't have any Windows system available here, but due to this Stackoverflow question there isn't any SIGTERM available: http://stackoverflow.com/questions/17566800/a-windows-equivalent-to-sigaction

I guess this code should work fine on any Unix like system. So would it be correct to exclude it just from Windows? via:

#ifndef _WIN32
     // sigterm patch
#endif


EDIT: *.diff file available at http://forum.simutrans.com/index.php?topic=13912.msg138076#msg138076
Founder and Ex-Maintainer of pak192.comic. Provider of Simutrans Hosting rental service.

Ters

Quote from: TurfIt on August 28, 2014, 05:36:56 PM
The OS is waiting for any sockets in the TIME_WAIT state to transition to fully CLOSED. Until they're all closed, an application can't rebind; 120s is a typically timeout for this to occur.

While this makes sense, and I've seen sockets hang out in TIME_WAIT for that long, I'm pretty sure I've restarted servers in less than two minutes many times. I kind of would have expected that it is the individual sockets returned by accept() for each peer that was in TIME_WAIT, not the listening socket passed as an argument to accept().

Dwachs

My experience is that this situation depends on whether a client is connected or not: If server is shutdown (by gui) while client is connected, I cannot immediately restart the server. If no client is connected during shutdown - no problem with restarting.
Parsley, sage, rosemary, and maggikraut.

Michael 'Cruzer'

Yes, Dwachs I agree to that.

After testing my patch provided in the link above for some days I found out, that this issue isn't fixed by a graceful shutdown. (And as you pointed out seems not be a failure of the patch, which does its work.)
Founder and Ex-Maintainer of pak192.comic. Provider of Simutrans Hosting rental service.