News:

Simutrans Wiki Manual
The official on-line manual for Simutrans. Read and contribute.

[CRASH] Modifying Schedule On a Bridge During Planning or Destruction.

Started by DrSuperGood, July 04, 2019, 06:00:07 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

DrSuperGood

To cause a crash go to line management window, create a new road vehicle line and leave the schedule planner window open. Now build a road bridge such that at least 1 tile of it is an elevated ground (not ramp). Using the schedule planner add a waypoint to any of the elevated ground tiles and keep the schedule planner window open. Now fully delete the freshly constructed bridge. Now try to rebuild the road bridge by planning the full span of the bridge but cancel before confirming construction. Simutrans (both standard and extended) has now crashed.

This crash has likely existed for a very long time. I first observed it while playing extended earlier today but I can successful recreate it in standard and have identified the cause.

The schedule planner uses a marker (zeiger_t) to represent the currently selected waypoint of the plan. For this to work, it is attached to the ground at the waypoint location. However the ground it is attached to might be destroyed, e.g. due to a bridge being deleted or a planned bridge being cancelled. When ground is destroyed all objects attached to it get destroyed. However the schedule planner holds a reference to the marker for re-use and so continues to manipulate it even after it is destroyed. This results in a fatal error as the planner window is accessing and calling methods with memory garbage.

I am making this topic rather than fixing it myself because as far as I can see there is no clear solution.

One cannot tell if an object has been deleted so adding checks is not possible. Since this mostly applies to bridge spans, tunnels and elevated ways one could check for these and not place the marker on them, however this would regress functionality and might not catch all cases. One could add a callback to the marker class allowing the schedule planner to be notified that the maker was deleted and hence stop using it and create a new one but this would be strange behaviour as no other classes do this. One could extend the marker class into a custom class which does something similar just for the schedule planner however this in inelegant and potentially other users of marker would run into this issue as well. One could modify objects themselves to have a flag that signifies the object life is being managed by a separate system and prevents their automatic removal with ground destruction, however as object flags are maxed out this might result in size of expansion which could decrease performance. One could add logic to exempt markers from automatic destruction by ground destruction however this might impact code which expects such behaviour and cause memory leaks.

It is possible I am missing some API of the code that could solve this easily. Hence this topic to discuss a sensible solution.

Ters

There are several places in Simutrans were references are held to things that might be deleted. That's what quickstone_tpl.h is there for.

Dwachs

The destroying of map objects is done in dataobj.cc:local_delete_object. There, all zeiger_t's are explicitly not deleted but only set to not-on-map. At which point does the program crash?
Parsley, sage, rosemary, and maggikraut.

DrSuperGood

Quote from: Dwachs on July 05, 2019, 06:42:16 AMThe destroying of map objects is done in dataobj.cc:local_delete_object. There, all zeiger_t's are explicitly not deleted but only set to not-on-map.
They are being deleted as part of the ground being deleted. Explanation is below.
Quote from: Dwachs on July 05, 2019, 06:42:16 AMAt which point does the program crash?
When trying to dereference a destroyed object. Usually around line 171 of schedule_gui.cc.
// always remove
if(  grund_t *old_gr = welt->lookup(current_stop_mark->get_pos())  ) { // Error prone due to dangling pointer.
current_stop_mark->mark_image_dirty( current_stop_mark->get_image(), 0 ); // MSVC crashed here due to invalid virtual function call.
old_gr->obj_remove( current_stop_mark );
old_gr->set_flag( grund_t::dirty );
current_stop_mark->set_pos( koord3d::invalid );
}

current_stop_mark is a dangling pointer to a destroyed object. The actual point of crash will vary as the result of the dangling pointer.

With MSVC builds it crashes another time because the failed virtual call executes a callback function which tries to close the schedule_gui_stats_t object. This causes the destructor to run which in turn results in a fatal crash because it tries to delete a dangling pointer which likely has an invalid memory structure by this time.

~schedule_gui_stats_t()
{
delete current_stop_mark; // Additional fatal crash here. However this is a side effect of the crash reporting logic in this case and not the actual crash.
delete last_schedule;
}


I tracked down where current_stop_mark is being deleted. In one case it was trigged by line 1,203 of simmenu.cc.

if(gr  &&   (gr->get_typ() == grund_t::tunnelboden  ||  gr->get_typ() == grund_t::monorailboden)  &&  gr->get_weg_nr(0) == NULL && !gr->get_leitung() ) {
welt->access(pos.get_2d())->boden_entfernen(gr);
delete gr; // current_stop_mark is deleted here
assert( !welt->lookup(pos));
}

Deleting a ground causes the grund_t destructor to run. The following destructor code is not directly responsible for the crash.

grund_t::~grund_t()
{
destroy_win((ptrdiff_t)this);

// remove text from table
set_text(NULL);

if(flags&is_halt_flag) {
get_halt()->rem_grund(this);
}
}

Now the problem is that in order to track objects attached to a ground an objlist_t member is used. As such when grund_t is explicitly destroyed by the above code it implicitly calls the destructor of objlist_t afterwards. This destructor explicitly deletes all objects it contains resulting in the marker being deleted.

objlist_t::~objlist_t()
{
if(  capacity == 1  ) {
delete obj.one; // current_stop_mark could be deleted here.
}
else {
for(  uint8 i=0;  i<top;  i++  ) {
delete obj.some[i]; // current_stop_mark could be deleted here.
}
}

if(capacity>1) {
dl_free(obj.some, capacity);
}
obj.some = NULL;
capacity = top = 0;
}

The result is that current_stop_mark is now a dangling pointer, code tries to dereference the dangling pointer and bad things happen usually resulting in a crash.
Quote from: Ters on July 05, 2019, 06:13:26 AMThere are several places in Simutrans were references are held to things that might be deleted. That's what quickstone_tpl.h is there for.
This seems likely the required solution. I will look into it.

Dwachs

Thanks! I did not look into the destructor of oblist. A simple fix would be to delete only non-zeiger_t objects in this destructor and set the not-on-map flag for zeiger_t's.

This should fix this bug. Also it should not leak memory as all(?) these zeiger-s are managed by other objects.
Parsley, sage, rosemary, and maggikraut.

DrSuperGood

Quote from: Dwachs on July 05, 2019, 10:03:53 AMThanks! I did not look into the destructor of oblist. A simple fix would be to delete only non-zeiger_t objects in this destructor and set the not-on-map flag for zeiger_t's.This should fix this bug. Also it should not leak memory as all(?) these zeiger-s are managed by other objects.
Yes it would fix the crash. However it is kind of an inelegant solution as it is coupling zieger to objlist and adding yet another exceptional logic case.

Dwachs

This coupling is already there, objlist is full of extra logic for special kind of objects.
Parsley, sage, rosemary, and maggikraut.

prissi

Otherwise one could enforce calling "cleanup(NULL)" for each obj before deletion. Then those could remove themselves. However, I am not sure if the objlist is in any state to allow this during deletion.

DrSuperGood

Quote from: Dwachs on July 05, 2019, 12:55:54 PMThis coupling is already there, objlist is full of extra logic for special kind of objects.
I decided to go with a variety of your approach except with the logic coming from a virtual method of obj_t rather than coupling specific types. This potentially allows one to change other object types to be exempt from such auto deletion if the need arises in the future without having to modify the object list class.

Should be fixed in R8777. This fix might need to be ported to Extended as it appears to suffer from the same crash.