News:

SimuTranslator
Make Simutrans speak your language.

Server Downtime

Started by Isaac Eiland-Hall, May 31, 2016, 01:53:18 AM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Isaac Eiland-Hall

Quick announcement as the server is back up.

Some sites may still be inaccessible; I'm still working to get them back up. I'll give a full rundown as soon as I'm confident everything is working again.

My apologies for the downtime.

Isaac Eiland-Hall

Some sites are definitely not accessible yet. Still working. Will update soon as possible.

Isaac Eiland-Hall

Approximately 6:18pm Central I became aware that some sites on the server were down.

Troubleshooting began immediately.

I quickly realized the cause of the issue, relating to shutting down an old server that - to be frank - I'd forgotten was connected to the new server. (In technical terms, it was in a DNS Cluster). In shutting down the old server, which had been left running by the datacenter incorrectly, I stupidly forgot to tell it NOT to delete all the DNS zones from all the servers in the cluster. This was a stupid, annoying mistake on my part. In my defense, dealing with the old datacenter had distracted me from remembering it was still in the cluster.

Unfortunately, once this was done, there was no undo.

Thankfully, I had a backup of all the zones. However, there were a lot of difficulties in restoring the zones from the backup. Or rather, it was easy to restore the zone files, but this caused problems that had to be subsequently fixed.

The problem was identified around 8:30pm, and around 9:45pm I believe all sites were back up; it took until a little after 10:00pm to test all sites on the server and verify everything is back up.

So I believe the total downtime was around 3.5 hours.

I apologize most sincerely for this. As far as preventing in future: I don't anticipate clustering servers in future, but even if I do - I have clustered in the past and don't anticipate making this mistake again.