The International Simutrans Forum

 

Author Topic: Please stress test large game to narrow down save corruption problems  (Read 3536 times)

0 Members and 1 Guest are viewing this topic.

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 18745
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
As those who have been playing on the server will know, there have been problems with saved games being corrupted. This error is one that is very hard to track down. To assist me in doing that, it would be very helpful if people could stress test the latest backed up saved game to see what particular steps or things (including doing things then waiting for the end of the game month before saving, and so forth) result in a corrupted saved game. To assist,  here are the steps to follow:

(1) download a copy of the backup here;
(2) open it and make a number of additions (new stops, lines, etc.) or other changes;
(3) record what you have done;
(4) save the game;
(5) open the game that you have just saved to see whether it crashes;
(6) if it does not, return to step (2); and
(7) if it does crash, post a report in this thread with details of the records that you took at step (3).

This would be extremely helpful in tracking down this rather difficult bug. Thank you everyone in advance!

Offline asaphxiix

  • *
  • Posts: 723
Re: Please stress test large game to narrow down save corruption problems
« Reply #1 on: January 13, 2013, 12:01:36 PM »
I'm on it!

Offline asaphxiix

  • *
  • Posts: 723
Re: Please stress test large game to narrow down save corruption problems
« Reply #2 on: January 13, 2013, 03:58:06 PM »
I have managed to narrow the problem down to a few minutes of play, between a good save (saved after backup.sve was loaded) and a bad save. I have written down what I did and managed to reproduce the problem two times in total (some attempts from the same save did not reproduce). However currently I'm having a dire situation with my new PC, bad clusters on SSD and whatnot, I can't access windows explorer with my browser to open an upload file dialog :<

I could probably store it on an FTP server.
« Last Edit: January 13, 2013, 04:42:28 PM by asaphxiix »

Offline greenling

  • Lounger
  • *
  • Posts: 1728
  • Simutransarchology it my hobby!
  • Languages: DE,EN
Re: Please stress test large game to narrow down save corruption problems
« Reply #3 on: January 13, 2013, 04:50:49 PM »
asaphxiix
Can you tomorrow buy a normal harddisk for you computer?
That your computer strikes can lay on the wrong SSD.

Offline asaphxiix

  • *
  • Posts: 723
Re: Please stress test large game to narrow down save corruption problems
« Reply #4 on: January 13, 2013, 05:12:23 PM »
hi greenling
what do you mean by 'strikes'?

Offline ӔO

  • Devotees (Inactive)
  • *
  • Posts: 2345
  • Hopefully helpful
  • Languages: en, jp
Re: Please stress test large game to narrow down save corruption problems
« Reply #5 on: January 13, 2013, 05:21:39 PM »
what worked for me:

remove aeolus ships stuck in replace mode and in depots
remove aeolus ship connection to coatsand
remove lindley and NLT ship connections to nutish
remove lindley ships on deanpool to monkinghall line
clear the backlog at Nutish, most of them want to go to Nutingham

after that no more save game corruptions were occurring.
http://dl.dropbox.com/u/17111233/testing.sve

Offline greenling

  • Lounger
  • *
  • Posts: 1728
  • Simutransarchology it my hobby!
  • Languages: DE,EN
Re: Please stress test large game to narrow down save corruption problems
« Reply #6 on: January 13, 2013, 05:45:26 PM »
Hello asaphxiix
the word Strikes it a another word for out of order.

Offline asaphxiix

  • *
  • Posts: 723
Re: Please stress test large game to narrow down save corruption problems
« Reply #7 on: January 13, 2013, 08:15:20 PM »
ok i've installed windows again, can upload files now!

so here's what I have: from backup2.sve I finished building the stops on the new line from Spinshore to Pollingfield

I built some 5 trains for the line and released them from the deopt.

I then saved before.sve - no problem loading this one.
Then I:
increased frequency to 15:21 minutes (25 per month) line Rail Coatsand=Trunkmere
sell one train from this line

sell 3 train from Rail Queenswell Wardale line
change the line's schedule, removing wardale terminus from the line and adding the left hand side platform instead. Then the trains'  schedules got mixed up, so I had to manually change their schedule from Reddford to Wardale in order to keep them from reversing on track.

I save now after.sve - crash:

FATAL ERROR: quickstone<T>::quickstone_tpl(T*,uint16)
slot (2) already taken

This was reproduced again, but in the second time I did not get an error message, but the game is stuck on loading.

other attempts from the same point did not reproduce the problem.


Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 18745
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Please stress test large game to narrow down save corruption problems
« Reply #8 on: January 14, 2013, 01:45:18 AM »
Thank you all very much - this is most helpful. AEO's results are interesting (although it is odd that these things would cause corruption at the stage of loading halts), and Asaph's work is useful in narrowing down where problems do occur.

I still haven't succeeded in isolating the bug yet, however, partly through lack of time, although I hope to be able to do some work on it to-morrow evening. In the meantime, it would be very helpful if people could do one or both of two things further to narrow down the possibilities:

(1) try it again in the latest release candidate to see whether the problem can be reproduced there; and
(2) try to go through all the various steps to recreate, removing one step at a time and checking whether any given subset of the steps necessary to recreate the crash (or, in AEO's case, necessary to prevent it occurring) will suffice so that some of the steps can be eliminated as superfluous.

If people are able to do these things, it will greatly assist me in narrowing the problem.

Thank you very much all for your help.

Offline ӔO

  • Devotees (Inactive)
  • *
  • Posts: 2345
  • Hopefully helpful
  • Languages: en, jp
Re: Please stress test large game to narrow down save corruption problems
« Reply #9 on: January 14, 2013, 03:02:29 AM »
I think there is definitely something wrong with nutish railport.

RC10.9019, 10.9 and 10.8 will crash sometime during December to January if the game is fast forwarded with nothing done and you use the query tool to check the station status of nutish railport. The status can be checked fine in October and November.

The game also seems to slow down quite a lot when pax at nutish decide to change their immediate destination due to refunds causing time increases.

Offline asaphxiix

  • *
  • Posts: 723
Re: Please stress test large game to narrow down save corruption problems
« Reply #10 on: January 14, 2013, 03:32:30 AM »
I also had a strange slow down at that time, and crashes in December. Also had a feeling the game wasn't crashing really because of something I was doing. Just to be sure, by crashing you mean, crashing on load yes?

Offline ӔO

  • Devotees (Inactive)
  • *
  • Posts: 2345
  • Hopefully helpful
  • Languages: en, jp
Re: Please stress test large game to narrow down save corruption problems
« Reply #11 on: January 14, 2013, 03:49:09 AM »
no, the crash happens during play, sometime in december or january when querying nutish railport and not on a loading a save.

The january autosave can be corrupted most of the time as well.

It doesn't seem to happen if most of the pax are removed from nutish somehow. Best tactic for that is to make one of the 20k pax destination stations into a port and make a new ship line to there, then pick up the pax and sell the ship off en-route with the pax aboard. Easiest one is probably when the pax want to go to oakwich.

Offline asaphxiix

  • *
  • Posts: 723
Re: Please stress test large game to narrow down save corruption problems
« Reply #12 on: January 14, 2013, 04:34:15 AM »
another quick way would be to remove the station probably?

strange issue. I did not have crashes in play.

Offline ӔO

  • Devotees (Inactive)
  • *
  • Posts: 2345
  • Hopefully helpful
  • Languages: en, jp
Re: Please stress test large game to narrow down save corruption problems
« Reply #13 on: January 14, 2013, 05:04:41 AM »
I think when the station reaches, probably 80k to 100k pax, the game may crash when pax reroutes to certain stations, and this is not always the same result, so the crash happens rather randomly.

In this save, which only affects my company, I cleared broken ships, withdrew a pair of frivolous ship lines, increased capacity of some railports, bolstered/replaced the overcrowded lines and dumped some 40k~50k pax into the sea.

http://dl.dropbox.com/u/17111233/backup2_nocrash.sve

It doesn't seem to corrupt or crash after that.

I'm not entirely sure why there is a difference of some 5mb between the backup2 save that crashes and my no-crash save, but I found it interesting.
« Last Edit: January 14, 2013, 05:10:20 AM by ӔO »

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 18745
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Please stress test large game to narrow down save corruption problems
« Reply #14 on: January 15, 2013, 02:19:15 AM »
Thank you very much everyone for all  your work - it is much appreciated. I have now found the cause of the corrupt games, and fixed it on my 10.x branch. The problem was that a signed 16 bit integer was used to store the count of all the goods/passenger packets at each stop. If the number exceeded 32,767, the variable would overflow into a negative number, causing the wrong data to be saved/loaded. I have now changed this to an unsigned 16 bit integer for now, giving a maximum of 65,335 packets (note that packets are not the same as actual passengers) per stop. If this number is exceeded, the excess quantities will be discarded on saving to prevent further corruption. When the major version is incremented to 11, this will be increased to a 32-bit unsigned integer, giving a vastly increased number and no need to discard.

More generally, I have also made some changes intended to try to stabilise passenger/goods routing, as the number of passengers waiting at Nutish Railport (the ultimate cause of the issue) is vastly excessive. Firstly, I have made it so that waiting times accrue, not just when passengers/goods actually board a convoy, but if they have been waiting more than twice as long as the current average waiting time. This is to prevent waiting times becoming stale on infrequently serviced routes. This is checked with the same frequency as the check for discarding of passengers.

Secondly, I have increased the frequency with which the pathing is refreshed from once a month (and the length of a month can be very great with higher bits per month values) to once every 8192 steps - about two game hours at 250m/tile, or 1 game hour at 125m/tile. From major version 11 onwards, this will be able to be set in simuconf.tab.

Hopefully, these changes between them will improve the stability and realism of the routing. I have not released this fixed version yet, as I have yet to fix the crash when looking at the window of a stop with a very large number of waiting passengers/mail/goods, which I hope to be able to fix before long.

Thank you again all for your help and your patience in addressing this problem.

Offline asaphxiix

  • *
  • Posts: 723
Re: Please stress test large game to narrow down save corruption problems
« Reply #15 on: January 15, 2013, 02:24:28 AM »
splendid news!
q: doesn't the current setting of 384 minutes per month actually take less than two real hours of play?

Offline Carl

  • Devotee
  • *
  • Posts: 1600
    • Website
  • Languages: EN
Re: Please stress test large game to narrow down save corruption problems
« Reply #16 on: January 15, 2013, 10:31:19 AM »
Secondly, I have increased the frequency with which the pathing is refreshed from once a month (and the length of a month can be very great with higher bits per month values) to once every 8192 steps - about two game hours at 250m/tile, or 1 game hour at 125m/tile. From major version 11 onwards, this will be able to be set in simuconf.tab.
That sounds like an excellent change. As someone who usually runs long months, I look forward to that!


Quote
Firstly, I have made it so that waiting times accrue, not just when passengers/goods actually board a convoy, but if they have been waiting more than twice as long as the current average waiting time. This is to prevent waiting times becoming stale on infrequently serviced routes. This is checked with the same frequency as the check for discarding of passengers.
How does this work with new lines where waiting times are all "unknown"?

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 18745
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Please stress test large game to narrow down save corruption problems
« Reply #17 on: January 15, 2013, 10:57:37 AM »
How does this work with new lines where waiting times are all "unknown"?

An "unknown" waiting time is deemed to be 1.9 minutes, so passengers/goods which have been waiting longer than 3.8 minutes by the time of the check will register their times.

Offline Carl

  • Devotee
  • *
  • Posts: 1600
    • Website
  • Languages: EN
Re: Please stress test large game to narrow down save corruption problems
« Reply #18 on: January 15, 2013, 10:58:37 AM »
Thanks. That's what I suspected, but I wanted to make sure. I think those two small changes will go a long way to improving the way that journey and waiting times are kept up to date. Thanks, James.