News:

Simutrans.com Portal
Our Simutrans site. You can find everything about Simutrans from here.

2025-10-13 Server Outage

Started by Isaac Eiland-Hall, October 13, 2025, 07:50:26 PM

Previous topic - Next topic

0 Members and 3 Guests are viewing this topic.

Isaac Eiland-Hall

During the afternoon, CPU on the server slowly increased, causing sites to stop responding. A reboot of the server seemed to help temporariy, but within the hour, sites became unresponsive again.

Further investigation seems to reveals >4.2GB in log files. Those files have been rotated and set t o rotate daily. Server CPU currently looks good but am monitoring.

Isaac Eiland-Hall

Update: Different log file got overwhelmed in less than 24 hours.

It appears the server is seeing increased traffic. Possibly bots trying to find vulneabilities. For now, AWStats paused and increased apache workers and it seems stable, but we did have a few minutes of downtime a couple of times this afternoon as I was working on it — in part because of two server restarts as well as the server running out of workers because it was being overloaded by writes to log files that were huge in size.

Keeping an eye on it. Of course, if problems happen when I'm asleep or at dialysis, it may be a bit, but I am actively monitoring.

Isaac Eiland-Hall

#2
Websites became briefly unavaiable this evening; I have increased the workers again.

I suspect there's some underlying issue I haven't found yet, but for now things again appear stable.

edit: Diagnostics are not turning up anything. It appears the Japanese Forum *may* be a contributor, but I don't feel confident on that yet. Next time there's a problem, I will run some information gathering before restarting the server to see if I can figure out the actual cause. Note that we did have some large log files not being rotated - those are being rotated now, so whatever part that had to play, it no longer does.

Isaac Eiland-Hall

A couple of times since the last post the server apparently has become overloaded to the point of sites being unavailable for a short time. I don't know how long, but one happened a couple of hours ago while I was away and the server appears to currently be fine without my intervention.

I'm still seeking the root cause, with some suspicions, but nothing confirmed. As long as things continue to run, I'm not overly worried for now.