Author Topic: Server Downtime  (Read 870 times)

0 Members and 1 Guest are viewing this topic.

Offline Isaac.Eiland-Hall

  • Benevolent Dictator
  • Administrator
  • *
  • Posts: 3429
  • Total likes: 319
  • Helpful: 91
  • PanamaCityPC.com/support/
    • Facebook Profile
  • Languages: EN
Server Downtime
« on: May 31, 2016, 01:53:18 AM »
Quick announcement as the server is back up.

Some sites may still be inaccessible; I'm still working to get them back up. I'll give a full rundown as soon as I'm confident everything is working again.

My apologies for the downtime.

Offline Isaac.Eiland-Hall

  • Benevolent Dictator
  • Administrator
  • *
  • Posts: 3429
  • Total likes: 319
  • Helpful: 91
  • PanamaCityPC.com/support/
    • Facebook Profile
  • Languages: EN
Re: Server Downtime
« Reply #1 on: May 31, 2016, 01:57:38 AM »
Some sites are definitely not accessible yet. Still working. Will update soon as possible.

Offline Isaac.Eiland-Hall

  • Benevolent Dictator
  • Administrator
  • *
  • Posts: 3429
  • Total likes: 319
  • Helpful: 91
  • PanamaCityPC.com/support/
    • Facebook Profile
  • Languages: EN
Re: Server Downtime
« Reply #2 on: May 31, 2016, 03:15:25 AM »
Approximately 6:18pm Central I became aware that some sites on the server were down.

Troubleshooting began immediately.

I quickly realized the cause of the issue, relating to shutting down an old server that - to be frank - I'd forgotten was connected to the new server. (In technical terms, it was in a DNS Cluster). In shutting down the old server, which had been left running by the datacenter incorrectly, I stupidly forgot to tell it NOT to delete all the DNS zones from all the servers in the cluster. This was a stupid, annoying mistake on my part. In my defense, dealing with the old datacenter had distracted me from remembering it was still in the cluster.

Unfortunately, once this was done, there was no undo.

Thankfully, I had a backup of all the zones. However, there were a lot of difficulties in restoring the zones from the backup. Or rather, it was easy to restore the zone files, but this caused problems that had to be subsequently fixed.

The problem was identified around 8:30pm, and around 9:45pm I believe all sites were back up; it took until a little after 10:00pm to test all sites on the server and verify everything is back up.

So I believe the total downtime was around 3.5 hours.

I apologize most sincerely for this. As far as preventing in future: I don't anticipate clustering servers in future, but even if I do - I have clustered in the past and don't anticipate making this mistake again.