News:

Congratulations!
 You've won the News Item Lottery! Your prize? Reading this news item! :)

SOLVED: [120.1.3] Crash on tool selection during multiplayer reload.

Started by DrSuperGood, August 17, 2016, 02:49:44 AM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

DrSuperGood

Running a MSVC built optimized build playing on a multiplayer server. The game reloaded due to someone joining and I decided to click and drag some road. Simutrans then crashed but I managed to pull the stacktrace from the debug symbols. I have run into this crash many times before, but never been able to debug it until now.


Simutrans 120-1-3 rmk.exe!abort() Line 77 C++
[External Code]
> Simutrans 120-1-3 rmk.exe!two_click_tool_t::cleanup(bool delete_start_marker) Line 1084 C++
Simutrans 120-1-3 rmk.exe!two_click_tool_t::init(player_t * __formal) Line 931 C++
Simutrans 120-1-3 rmk.exe!tool_build_way_t::init(player_t * player) Line 2215 C++
Simutrans 120-1-3 rmk.exe!karte_t::local_set_tool(tool_t * tool_in, player_t * player) Line 3287 C++
Simutrans 120-1-3 rmk.exe!karte_t::set_tool(tool_t * tool_in, player_t * player) Line 3272 C++
Simutrans 120-1-3 rmk.exe!tool_selector_t::infowin_event(const event_t * ev) Line 137 C++
Simutrans 120-1-3 rmk.exe!check_pos_win(event_t * ev) Line 1430 C++
Simutrans 120-1-3 rmk.exe!interaction_t::process_event(event_t & ev) Line 380 C++
Simutrans 120-1-3 rmk.exe!interaction_t::check_events() Line 453 C++
Simutrans 120-1-3 rmk.exe!karte_t::interactive(unsigned int quit_month) Line 6972 C++
Simutrans 120-1-3 rmk.exe!simu_main(int argc, char * * argv) Line 1314 C++
Simutrans 120-1-3 rmk.exe!sysmain(int argc, char * * argv) Line 805 C++
Simutrans 120-1-3 rmk.exe!WinMain(HINSTANCE__ * hInstance, HINSTANCE__ * __formal, char * __formal, int __formal) Line 968 C++
[External Code]
[Frames below may be incorrect and/or missing, no symbols loaded for kernel32.dll]


The following values were extracted at two_click_tool_t::cleanup.

delete_start_marker true bool
- start_marker 0x0c9c2f28 {pos={x=653 y=673 z=2 '\x2' } xoff=0 '\0' yoff=0 '\0' ...} zeiger_t * {obj_t}
+ [obj_t] {pos={x=653 y=673 z=2 '\x2' } xoff=0 '\0' yoff=0 '\0' ...} obj_t
+ obj_no_info_t {...} obj_no_info_t
+ area {x=0 y=0 } koord
+ offset {x=0 y=0 } koord
bild 1501 unsigned int
after_bild 4294967295 unsigned int
- this 0x03e34d18 {besch=0x03f1cef8 {max_weight=100 styp=0 '\0' draw_as_obj=0 '\0' ...} } two_click_tool_t * {tool_build_way_t}
+ [tool_build_way_t] {besch=0x03f1cef8 {max_weight=100 styp=0 '\0' draw_as_obj=0 '\0' ...} } tool_build_way_t
+ tool_t {icon=5657 id=4110 default_param=0x04ad02b4 "asphalt_road" ...} tool_t
first_click_var true bool
+ start {x=-1 y=-1 z=-1 'ÿ' } koord3d
+ start_marker 0x0c9c2f28 {pos={x=653 y=673 z=2 '\x2' } xoff=0 '\0' yoff=0 '\0' ...} zeiger_t * {obj_t}
+ marked {head=0x00000000 <NULL> tail=0x00000000 <NULL> node_count=0 } slist_tpl<zeiger_t *>


The crash is occurring...

// delete marker.
if(  start_marker!=NULL  &&  delete_start_marker) {
start_marker->mark_image_dirty( start_marker->get_image(), 0 ); // <--------- CRASH
delete start_marker;
start_marker = NULL;
}

I am unfamiliar with the code, and the crash is hard to reproduce. As a very rough guess it might be trying to get an invalid image or draw it in an invalid way?

Ters

I notice that abort is a the top of the stack. abort is called as a suicide option when the program catches itself doing something it should not, rather than the OS catching it doing something bad. The only such case I can see that applies in this case, is that the code attempts to call a pure virtual function, which would have to be start_marker->get_image(). This would mean that start_marker->~zeiger_t() has been called, or that start_marker->zeiger_t() hasn't. Since I don't see any (relevant) constructors or destructors on the stack, which could indicate a slightly different scenario, this leaves me with the theory that the object pointed to by start_marker has been deleted already, and that start_marker therefore is a dangling reference.

TurfIt

I imagine it would be helpful if ::init actually init'd start_marker....  I presume you were in the middle of constructing when rudely interrupted by the joiner - from what I see, in that case ::cleanup will never be called, hence start_marker is left dangling since it's only ever init during pakset loading.

Dwachs

If this is indeed the case, then it would suffice to change line 4626 of simworld.cc to

tool->cleanup( true );

Maybe there are more places in the code, where the parameter of cleanup needs to be set to 'true'.

Parsley, sage, rosemary, and maggikraut.

DrSuperGood

Quote
I imagine it would be helpful if ::init actually init'd start_marker....  I presume you were in the middle of constructing when rudely interrupted by the joiner - from what I see, in that case ::cleanup will never be called, hence start_marker is left dangling since it's only ever init during pakset loading.
I forget if I was in the middle of building on not when I was interrupted, but I probably was. After the reload completed I then went on to place some road down. The crash occurred the instant the server resumed the game (join complete) and the road I placed between reload and resume was not built.

Ters

I can only get the posted dumps to make sense and be reproducible if start_marker points to an obj_t. A dangling pointer to a zeiger_t doesn't seem to create the same result. (Actually, I have been using a small purpose-written test case, not actual Simutrans code.) However, I can't see how that can happen, unless someone has been messing with threads when they should not.

Dwachs

Quote from: TurfIt on August 17, 2016, 06:20:59 AMI presume you were in the middle of constructing when rudely interrupted by the joiner - from what I see, in that case ::cleanup will never be called, hence start_marker is left dangling since it's only ever init during pakset loading.
This is pretty much spot on: cleanup is called, but with delete_start_marker==false in karte_t::save. On the next activation of the same tool (after save & reload due to joining), the crash happens.

Please check with r7859. This bug seems to have been in the code since these two-click tools work with network mode (>4 years ago).
Parsley, sage, rosemary, and maggikraut.

DrSuperGood

Quote
This bug seems to have been in the code since these two-click tools work with network mode (>4 years ago).
I ran into it on Jame's old Experimental server a lot due to the server being less stable (more re-joins) and slower (longer exposure to tools in a certain stage).

I can confirm this has been here for a long time. I believe I made a thread about it in the past, however that might have been another issue as I stupidly forgot to retain the symbol file for the build.

Quote
Please check with r7859.
Very hard to do seeing how the only normal time one encounters it is occasionally on a multiplayer server.

prissi

I think a release may be in order soon, as this is quite a critical thing affecting multiplayer. Especially since with steam more players are looking for a network game.