The International Simutrans Forum

Information and Announcements => Information & Announcements => Archived Announcements => Topic started by: Isaac Eiland-Hall on August 24, 2010, 03:44:21 AM

Title: Server downtime of ~7 hours
Post by: Isaac Eiland-Hall on August 24, 2010, 03:44:21 AM
What a comedy of errors. It would be funny (i.e. a "comedy") if it hadn't taken us offline for all afternoon and evening. ARGH.

Long story short: I have been having issues with the attempts to find the source of the hacking. I was told I shouldn't "bump" my ticket. So when the server went offline, I assumed it was another hacking situation.

After 1.5 hours of being down, I did bump the ticket; but for $30/mo, they don't provide emergency service. So I had to wait until it had been down for ~3 hours to follow-up with a complaint...

Another hour to hear a reply, which adivsed that it wasn't a result of their work.

So, an emergency ticket was opened with iWeb. The response there WAS immediate; but it took them ~30mins to investigate (including getting a keyboard to the server itself); then they had to pass it on to another department. All told, it took nearly two hours to figure out the problem, which was an error in the network configuration. Technically, iWeb's fault - I downgraded from 100Mbps port to 10Mbps port (since we don't use it, and it's $10/mo I'm paying for no use)......

So the length of the outage is due to an extremely rare mistake from iWeb, combined with dealing iwth the hacking issue.

I apologize for the downtime.
Title: Re: Server downtime of ~7 hours
Post by: Isaac Eiland-Hall on August 24, 2010, 04:24:37 PM
Update: Downtime this morning of ~2 hours somewhat related. I'm too tired right now to go into much detail, but basically it was a different problem caused by an attempt to kill a hacking in progress.

Then an additional ~30mins unavailability because the nameserver wasn't working right after repairs to the system.

Everything *should* be up again finally/fully/firmly - a final check is being done (and additional reboot + checks to try and prevent further problems)