The International Simutrans Forum

 

Author Topic: Demo.sve can't play for 10 months due to a crash  (Read 673 times)

0 Members and 1 Guest are viewing this topic.

Online Ranran

  • Devotee
  • *
  • Posts: 1160
  • Languages: ja
Demo.sve can't play for 10 months due to a crash
« on: April 18, 2020, 03:56:33 AM »
Hi. I stay home with simutrans. (´・ω・`)

Earlier I reported in another thread a bug that caused the game to be unable to continue to play.
That means freeze bug and crashing bugs often stop the game.
The freezing bug was fixed by your fix, but it still crashes.
Crashes occur randomly between 2 hours and 10 months(in game), and the game crashes in about 2 months on average.
So I have to make frequent saves, like the old simutrans, frightened by crashes. (´・ω・`)


When I debug with a crash while playing in a nightly build, I get a screen like this.

I have little knowledge so I don't know what's going on.

But in a self-built debug build I can see the following error:
(The timing of the crash is the same)





This seems to be causing by the private car routing.

I'm not sure what to submit, but the above error always prevents play from continuing.
Any advice on how to submit what should be submitted would be helpful.
« Last Edit: April 23, 2020, 09:04:54 AM by Ranran »

Offline jamespetts

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 20207
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: I can't play for 10 months due to a crash
« Reply #1 on: April 22, 2020, 04:03:48 PM »
Thank you for your report. Are you able to upload a saved game in which this can reliably be reproduced?

Online Ranran

  • Devotee
  • *
  • Posts: 1160
  • Languages: ja
Re: Demo.sve can't play for 10 months due to a crash
« Reply #2 on: April 23, 2020, 09:04:24 AM »
Are you able to upload a saved game in which this can reliably be reproduced?
It is a demo.sve. Did you make a check in demo.sve?


I tried another save but it played for 3 consecutive years and didn't cause a crash.
I often use demo.sve in testing patch creation (because it is loaded first if it crashes) and demo.sve was causing crashes frequently.
And I can confirm this crash by nightly build or by building a master branch by myself, so I think that there is a cause of crash in demo.sve. It still crashes on today's nightly build (# 1bd6b2f).
I suspect the code I've shown above is a hint as to what is causing it.
If this really only happens in demo.sve it may not be a big issue.
This happens at random timing. But demo.sve can easily reproduce it. The shortest crash was at 11:12. My longest record is 10 months.
I tried to save just before the crash, but reading the save just before the crash did not crash at the same timing.
However, saving demo.sve still causes the crash. It just resets the crash timing. It is possible to survive WW2 by repeating it, but it is troublesome.

Offline jamespetts

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 20207
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Demo.sve can't play for 10 months due to a crash
« Reply #3 on: April 23, 2020, 11:10:25 AM »
Thank you for the clarification: this is helpful. I am now able to reproduce this (for reference, this works with fast forwarding), although the nature of the problem is extremely complex and difficult to deal with because it arises in a highly complex part of the code that is derived from Standard (the freelist) and memory corruption in this part of the code is extremely difficult to diagnose.

What appears to happen is that, when the private car routing function is working, there comes a time when it is necessary to add a private car route to a city, industry or attraction. This involves setting an element of a hashtable (that is, replacing the existing value, if any, of a hashtable with a new value or add a new value if there be not an existing value) in the city, industry or attraction with the co-ordinate of the destination and the journey time per tile.

In the hashtable code, when a new item is to be appended to the hashtable and the hashtable needs to expand, this calls the "gimme_node" code in freelist. Freelist is an efficient memory management system. Testing has shown the Simutrans hashtable to be very much faster than the STD hashtable.

The crash occurs in freelist itself, always at the following line:

Code: [Select]
*list = tmp->next;

The problem is always that "tmp" points to an invalid memory location. "tmp" was in fact set to the value of *list in the immediately preceding line, and "list", in turn, is assigned as follows:

Code: [Select]
list = &(all_lists[size/4]);

all_lists is an array of list objects. The crash always occurs when [size/4] == 6. In every occasion on which the crash occurs, list[0] - list[5] inclusive are all NULL in value. list[6] is a non-NULL but invalid location, and list[7] and upwards are a mix of valid and NULL locations.

This hashtable call and thus freelist work is done outside the main thread, and freelist is static, so I had wondered whether there was some thread based interference (e.g. two threads altering the value of all_lists at once). However, there is a mutex (from Standard) protecting freelist; I have added some error checking code (pushed to the master branch), which would have detected any failure to initialise the mutex or invalid calls (e.g. two sequential unlocks without a lock, which results in undefined behaviour) and this appears to work without error.

Knowing so little about how freelist works, and freelist being fantastically complicated, I am rather at a loss as to how to deal with this. That the problem only arises reliably in demo.sve does not mean that this is not really a serious problem, since this may well indicate more widespread memory corruption which may cause difficult to trace crashes and/or freezes in other circumstances, saved game corruption or even be responsible for the reported losses of network synchronisation, which are reported to be associated with road editing, which in turn would invoke the code which ultimately crashes here. (I have tried explicitly checking whether the element is already contained in the hashtable and calling "put" rather than "set" if it is, but this makes no difference).

Any assistance from anyone who knows anything about freelist in how to go about tracking down this issue would be most welcome.

Offline jamespetts

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 20207
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Demo.sve can't play for 10 months due to a crash
« Reply #4 on: April 27, 2020, 03:39:52 PM »
I have been attempting to use Dr. Memory to investigate this. Unfortunately, Dr. Memory itself seems to have a bug which causes it to crash whenever I try to use it. A reported workaround is running it with the "dr_debug" parameter, but this makes it so slow as to be unusable (it has been ~15 minutes and counting unresponsive after selecting which pakset to load).

I suspect that the problem reported here may well be the same as some of the other instability issues reported, so it is of importance to track this problem down and fix it.

Offline prissi

  • Developer
  • Administrator
  • *
  • Posts: 10139
  • Languages: De,EN,JP
Re: Demo.sve can't play for 10 months due to a crash
« Reply #5 on: April 28, 2020, 02:27:53 AM »
This looks very much like something is incorrectl\y accessing memory (like a double dereferencing instead on a single one). and thus overwriting memory outside its domain. Freelist is very bad at containing those, the price for the much lower overhead.

You can make the validation tool valgrind to check freelist actions too by defining USE_VALGRIND_MEMCHECK to some degree. As less performace detrimental option, you can define DEBUG_FREELIST, which adds a magic, size, and double free check. (Because double frees will no go well with freelist, which happily hands out the same memory twice the next time.)

However, given the huge impact memorywise I really wonder if individual routes calculated on the fly for each citycar would not be a better way to go. This would extremely simplify the overtaking code too, since everyone know if a vehicle will turn or not.

Offline jamespetts

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 20207
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Demo.sve can't play for 10 months due to a crash
« Reply #6 on: April 30, 2020, 12:41:08 PM »
Thank you for that: that is helpful. My debug builds are set up to define DEBUG_FREELIST by default in any event, so, if this catches double frees, this is not what is happening here.

Is there a particular process to follow when using USE_VALGRIND_MEMCHECK, or is it just a case of defining this in a debug build and then running Valgrind on the resulting executable?

Edit: Testing saving this game immediately after loading and then re-loading the saved game makes no difference, meaning that the bug is not an anomaly confined to code for loading older saved games.

Prissi (or anyone else familiar with the freelist code) - do you know where in the code that nodelist_node_t objects are deleted? It seems clear that such an object is being deleted here where it ought not to be deleted, but it is not clear where or how this is happening, and not knowing where these objects are deleted in the code is hampering my efforts to find where the incorrect deletion is occurring.

Edit 2: I have narrowed this down somewhat. I found that commenting out

Code: [Select]
delete tmp;

in the clear() method of slist_tpl.h prevented these crashes (but presumably causes a memory leak). I then traced back the calls to clear() in slist_tpl.h, and found that these came, inter alia from clear() in hashtable_tpl.h.

I then found that commenting out the following code in check_all_private_car_routes() in simcity.cc prevented the crashes:

Code: [Select]
connected_cities.clear();
connected_industries.clear();
connected_attractions.clear();

(For reference, all of those collection classes are koordhashtable_tpl classes).

check_all_private_car_routes is a method that is run in a thread that is not the main thread, one of the multiple threads for private car route checking. However, putting these lines of code into a mutex (so that only one thread could access these lines of code at once) did not prevent the crashes.

Something odd is obviously happening with the memory of objects stored in hashtables when they are cleared, but it is not clear to me what that odd thing is. The same hashtable should not be able to be cleared by multiple threads at once, since only one thread should ever be able to work on one city (or else there would be far more bugs if this were not working), and in any event, the mutex that I tested should have prevented this.

Is there something odd about the interaction of the hashtable/slist clear() function and freelist with multi-threading that makes this not work as one would expect? If so, it would help to know something about this to try to track down this problem.
« Last Edit: April 30, 2020, 01:48:40 PM by jamespetts »

Offline jamespetts

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 20207
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Demo.sve can't play for 10 months due to a crash
« Reply #7 on: April 30, 2020, 03:14:22 PM »
I think that I might have managed to fix this, as pushed now to the master branch. If I enclose both the connected_cities (etc.) clearing code and the code that triggers the crash in the first instance in mutexes, this seems to solve the problem.

I should be grateful if people could re-test with to-morrow's nightly build to see whether this has been fixed fully.

Online Ranran

  • Devotee
  • *
  • Posts: 1160
  • Languages: ja
Re: Demo.sve can't play for 10 months due to a crash
« Reply #8 on: May 01, 2020, 06:26:05 PM »
I played the demo.sve in fast forward for 7 years in a row until Japan became independent in today's build, but no crashes occurred. It succeeded.  :D

Offline jamespetts

  • Simutrans-Extended project coordinator
  • Moderator
  • *
  • Posts: 20207
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Demo.sve can't play for 10 months due to a crash
« Reply #9 on: May 01, 2020, 06:41:49 PM »
I played the demo.sve in fast forward for 7 years in a row until Japan became independent in today's build, but no crashes occurred. It succeeded.  :D

Splendid, thank you for confirming.