News:

Simutrans Chat Room
Where cool people of Simutrans can meet up.

Crash when starting a scenario while another is running

Started by ceeac, September 21, 2022, 05:11:57 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

ceeac

I believe the crash happens when a scenario is loaded during a suspended script function call. (?)
To reproduce, load the automated tests scenario without closing the game until the crash happens.

Backtrace:
=================================================================
==52739==ERROR: AddressSanitizer: heap-use-after-free on address 0x613000111bd8 at pc 0x55750147c408 bp 0x7ffff1e95850 sp 0x7ffff1e95848
READ of size 8 at 0x613000111bd8 thread T0
    #0 0x55750147c407 in sq_pushstring /home/ceeac/Projects/code/simutrans/src/squirrel/squirrel/sqapi.cc:237:40
    #1 0x5575011a1e94 in script_api::param<char const*>::push(SQVM*, char const* const&) /home/ceeac/Projects/code/simutrans/src/simutrans/script/api_param.cc:228:4
    #2 0x557500ebe7df in script_api::param<script_api::call_tool_work>::push(SQVM*, script_api::call_tool_work) /home/ceeac/Projects/code/simutrans/src/simutrans/script/api/api_command.cc:264:9
    #3 0x557500ebdbba in command_work(SQVM*) /home/ceeac/Projects/code/simutrans/src/simutrans/script/api/api_command.cc:168:10
    #4 0x55750153fcf3 in SQVM::CallNative(SQNativeClosure*, long long, long long, SQObjectPtr&, int, bool&, bool&) /home/ceeac/Projects/code/simutrans/src/squirrel/squirrel/sqvm.cc:1226:18
    #5 0x5575015369e0 in SQVM::Execute(SQObjectPtr&, long long, long long, SQObjectPtr&, unsigned long long, SQVM::ExecutionType, unsigned long long) /home/ceeac/Projects/code/simutrans/src/squirrel/squirrel/sqvm.cc:788:7
    #6 0x55750148b611 in sq_wakeupvm /home/ceeac/Projects/code/simutrans/src/squirrel/squirrel/sqapi.cc:1242:9
    #7 0x557501465524 in sq_resumevm(SQVM*, unsigned long long, long long) /home/ceeac/Projects/code/simutrans/src/squirrel/sq_extensions.cc:71:17
    #8 0x5575011b152a in script_vm_t::intern_resume_call(SQVM*) /home/ceeac/Projects/code/simutrans/src/simutrans/script/script.cc:377:7
    #9 0x5575011b0901 in script_vm_t::intern_finish_call(SQVM*, script_vm_t::call_type_t, int, bool) /home/ceeac/Projects/code/simutrans/src/simutrans/script/script.cc:278:3
    #10 0x557500a330c5 in char const* script_vm_t::call_function<int, unsigned int>(script_vm_t::call_type_t, char const*, int&, unsigned int const&) /home/ceeac/Projects/code/simutrans/src/simutrans/dataobj/../script/script.h:114:9
    #11 0x557500a330c5 in scenario_t::step() /home/ceeac/Projects/code/simutrans/src/simutrans/dataobj/scenario.cc:592:30
    #12 0x557501431338 in karte_t::step() /home/ceeac/Projects/code/simutrans/src/simutrans/world/simworld.cc:3322:19
    #13 0x55750144d42e in karte_t::interactive(unsigned int) /home/ceeac/Projects/code/simutrans/src/simutrans/world/simworld.cc:6231:6
    #14 0x557501280b2e in simu_main(int, char**) /home/ceeac/Projects/code/simutrans/src/simutrans/simmain.cc:1677:9
    #15 0x55750128e713 in sysmain(int, char**) /home/ceeac/Projects/code/simutrans/src/simutrans/sys/simsys.cc:1156:9
    #16 0x7f9a6b6f3d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    #17 0x7f9a6b6f3e3f in __libc_start_main csu/../csu/libc-start.c:392:3
    #18 0x55750083b654 in _start (/media/ceeac/Projects/code/simutrans/build/simutrans/simutrans+0x220654) (BuildId: 3d8603cb2c1f97b5973dd7c809667e6ba0c4adfd)

0x613000111bd8 is located 280 bytes inside of 376-byte region [0x613000111ac0,0x613000111c38)
freed by thread T0 here:
    #0 0x5575008be1f2 in free (/media/ceeac/Projects/code/simutrans/build/simutrans/simutrans+0x2a31f2) (BuildId: 3d8603cb2c1f97b5973dd7c809667e6ba0c4adfd)
    #1 0x55750151a914 in SQObjectPtr::Null() /home/ceeac/Projects/code/simutrans/src/squirrel/squirrel/sqobject.h:293:3
    #2 0x55750151a914 in SQTable::_ClearNodes() /home/ceeac/Projects/code/simutrans/src/squirrel/squirrel/sqtable.cc:207:92
    #3 0x55750151a914 in SQTable::Finalize() /home/ceeac/Projects/code/simutrans/src/squirrel/squirrel/sqtable.cc:212:2

previously allocated by thread T0 here:
    #0 0x5575008be49e in malloc (/media/ceeac/Projects/code/simutrans/build/simutrans/simutrans+0x2a349e) (BuildId: 3d8603cb2c1f97b5973dd7c809667e6ba0c4adfd)
    #1 0x55750147ae1e in sq_newthread /home/ceeac/Projects/code/simutrans/src/squirrel/squirrel/sqapi.cc:66:13
    #2 0x5575011afb27 in script_vm_t::script_vm_t(char const*, char const*) /home/ceeac/Projects/code/simutrans/src/simutrans/script/script.cc:123:11

SUMMARY: AddressSanitizer: heap-use-after-free /home/ceeac/Projects/code/simutrans/src/squirrel/squirrel/sqapi.cc:237:40 in sq_pushstring
Shadow bytes around the buggy address:
  0x0c268001a320: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c268001a330: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c268001a340: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fa
  0x0c268001a350: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
  0x0c268001a360: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
=>0x0c268001a370: fd fd fd fd fd fd fd fd fd fd fd[fd]fd fd fd fd
  0x0c268001a380: fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa fa
  0x0c268001a390: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c268001a3a0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c268001a3b0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fa
  0x0c268001a3c0: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==52739==ABORTING

Dwachs

Each scenario instance has its own virtual machine. It should not happen that a script continues to run when its vm was destroyed.

At which parts of the tests did this happen? I got this error once or twice but without stacktrace, I thought it was my faulty valgrind instrumentation of freeobj.

Could it be that the loading of the new scenario happened while the api_command called welt->call_work_api? The error happens when the command wants to push the error message to return it to the script.
Parsley, sage, rosemary, and maggikraut.

Dwachs

This bug might be due to this change (r10618)

https://github.com/aburch/simutrans/commit/b72967711ffb5bf6fa4080e70a3876730c313390

Now sync_step does check_events, which might load a savegame etc. But sync_step is called from INT_CHECK's scattered throughout the code. But this code assumes that the current world is not destroyed (reloaded).
Parsley, sage, rosemary, and maggikraut.

prissi

The easiest thing would be to make the action in scenario_frame_t::item_action(const char *fullpath) a not game_state_safe tool, which will be then processed as before in the next snyc_step.

Done in r10755

Dwachs

I think we also need to do this for load/save game and possibly other actions.
Parsley, sage, rosemary, and maggikraut.

Dwachs

prissi, what about the following idea: such dangerous calls come from action_triggered. Why not save the dangerous calls (like the defered_moves), and perform the calls to action_triggered outside of sync-step? In that way, we do not have to implement new tools for all these things like load/save/who-knows-what-else.

And if such a call to action_triggered is saved then the event processing is stopped. So that no further gui actions can change the state of the window and the result of the call.

Parsley, sage, rosemary, and maggikraut.

prissi

Doing action_triggered outside of sync_step would just give the sluggish old GUI again, because this means any click to enter a letter or change a tab would take one second on a weak device like an Android tablet, if a route search is running or a new month is starting.

Queuing the tools is much more sensible, since most actions do not need to wait that long, and there is already a proven queueing mechanism, as well as a selector of safe and unsafe tools.

And how to differentiate between potentially dangerous tools and safe tools: Both could be triggered by an action_triggered from a numberinput callback?

But maybe I misunderstood your idea.

Dwachs

Forgot one half of the explanation. The idea was to replace add_listener calls by add_dangerous_listener calls. So in savegame_frame.cc the save button is
savebutton.add_dangerous_listener(this)
And in call_listeners (gui_action_creator.h) calls to dangerous listeners are queued (instead of calling directly), queue could be in simwin. These queued calls are then performed outside sync-step.

If the queue of such calls is non-empty then event-handling in sync-step is stopped (only for window-related events?), so that the content of the gui does not change after the dangerous click.

Only a small number of add_dangerous_listener have to be placed. Most of the gui actions are harmless. Calls to tools are already treated correctly (set_tool_api, call_work_api). So this is only for some selected actions that kill objects (world, script, scenario).

The advantage would be that parameters do not need to be transferred from the gui to the tool and then processed. If the change can be made to savegame_frame.cc then for all derived dialogs we do not need extra tools.
Parsley, sage, rosemary, and maggikraut.

prissi

Loading a new font or translation would not be dangerous though, so the naming is a little misleading. And one would need another queue to process also for modal dialogues. And changes would be needed for the child dialoges, if only to convert the name of the called function.

Are there really that many new tools needed? One for loading a game, saving a game, loading a scenario. Adding such simple_tools is done quickly. I think I can do that tomorrow.

And that would decouple the interface from the tools properly.

Dwachs

I do not know about modal dialogues. These run outside of sync-step? I think they are not a problem.

My idea was to solve this problem once and for all. So that we do not need to create additional tools when this problem resurfaces. You are right, there will not be too many of these calls in the code (I bet less than 10).
Parsley, sage, rosemary, and maggikraut.

prissi

I think the GUI code should always call tools. That way there is a much clearer distinction and should make debugging easier since everything run through tools. And if one tool was found later to be unsafe (or safe) as well, one just changes a flag of the tool instead digging through GUI code.

These are one of the last map changing events left that do not call tools.

Dwachs

Parsley, sage, rosemary, and maggikraut.