News:

Simutrans Sites
Know our official sites. Find tools and resources for Simutrans.

Segfaults involving karte_t::random_callers

Started by ACarlotti, January 11, 2019, 01:35:50 AM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

ACarlotti

I am sometimes getting segfaults involving karte_t::random_callers. I'm not totally sure how the segfaults are triggered, given that the local variables don't appear to be consistent with causing a segfault. However, I note the use of random_callers appears to be thread-unsafe, since it a global vector that can be modified by random processes in several parallel threads.

The first location for a segfault is:

#0  0x00007ffff7880886 in __strcspn_sse42 () from /usr/lib/libc.so.6
#1  0x000055555592bc11 in message_t::get_coord_from_text (text=0x0) at simmesg.cc:228
#2  0x000055555592b88e in message_t::add_message (this=0x5555591ea040, text=0x0, pos=..., what_flags=1, color=240, image=4294967295) at simmesg.cc:171
#3  0x000055555598ed38 in karte_t::network_disconnect (this=0x555558e895f0) at simworld.cc:10777
#4  0x000055555598c4bc in karte_t::process_network_commands (this=0x555558e895f0, ms_difference=0x7fffffffb82c) at simworld.cc:10220
#5  0x000055555598d93d in karte_t::interactive (this=0x555558e895f0, quit_month=2147483647) at simworld.cc:10479
#6  0x000055555591f6a2 in simu_main (argc=5, argv=0x7fffffffe9b8) at simmain.cc:1382
#7  0x0000555555932cd8 in sysmain (argc=5, argv=0x7fffffffe9b8) at simsys.cc:825
#8  0x0000555555a04997 in main (argc=5, argv=0x7fffffffe9b8) at simsys_s2.cc:792

This appears to be due to one of the entries in random_callers being null, presumably because two threads were simultaneously updating the count and writing to the vector.

And the second is:

#0  0x000055555563925d in vector_tpl<char const*>::resize (this=0x555555b509f0 <karte_t::random_callers>, new_size=16777216) at dataobj/../tpl/vector_tpl.h:67
#1  0x00005555556384b6 in vector_tpl<char const*>::append (this=0x555555b509f0 <karte_t::random_callers>, elem=@0x7ffff6b9e058: 0x7ffef87fc510 "template<typename T, template<typename> class U> T const& pick_any_weighted(U<T> const& container) (-1227479262); call: (8375127); seed: (988281416). rand 16385, max 36016")
    at dataobj/../tpl/vector_tpl.h:107
#2  0x0000555555986920 in simrand (max=36016, caller=0x555555a1a8e8 "template<typename T, template<typename> class U> T const& pick_any_weighted(U<T> const& container)") at utils/simrandom.cc:135
#3  0x000055555588ee88 in pick_any_weighted<gebaeude_t*, weighted_vector_tpl> (container=...) at utils/simrandom.h:64
#4  0x0000555555966e18 in karte_t::find_destination (this=0x555556401a60, trip=karte_t::visiting_trip, g_class=0 '\000') at simworld.cc:7426
#5  0x0000555555964502 in karte_t::generate_passengers_or_mail (this=0x555556401a60, wtyp=0x555567c4bb30) at simworld.cc:6446
#6  0x000055555595095d in step_passengers_and_mail_threaded (args=0x555555cab6e0) at simworld.cc:1798
#7  0x00007ffff7f94a9d in start_thread () from /usr/lib/libpthread.so.0
#8  0x00007ffff796ab23 in clone () from /usr/lib/libc.so.6
(gdb) print size
$1 = 16777216
(gdb) print count
$2 = 8388692
(gdb) print i
$3 = 8083966
(gdb) print new_data[i]
$4 = 0x0
(gdb) print new_data[i-1]
$5 = 0x7fff13e5e490 "template<typename T, template<typename> class U> T const& pick_any_weighted(U<T> const& container) (-1173986656); call: (8070975); seed: (1976237181). rand 141320, max 149877"
(gdb) print data[i+1]
$6 = 0x7fff13e5e550 "template<typename T, template<typename> class U> T const& pick_any_weighted(U<T> const& container) (-833230789); call: (8070977); seed: (1976237181). rand 118571, max 149877"
(gdb) print data[i]
$7 = 0x7ffefdff3300 "template<typename T, template<typename> class U> T const& pick_any_weighted(U<T> const& container) (1571676265); call: (8070976); seed: (3952148711). rand 123959, max 149877"
(gdb) info threads
  Id   Target Id                                          Frame
  1    Thread 0x7ffff786d140 (LWP 5201) "simutrans-exten" 0x00007ffff7f9bc7e in pthread_barrier_wait () from /usr/lib/libpthread.so.0
  19   Thread 0x7fff71e84700 (LWP 5239) "simutrans-exten" 0x00007ffff7f9bc7e in pthread_barrier_wait () from /usr/lib/libpthread.so.0
  20   Thread 0x7fff71683700 (LWP 5240) "simutrans-exten" 0x00007ffff7f9bc7e in pthread_barrier_wait () from /usr/lib/libpthread.so.0
  21   Thread 0x7fff70e82700 (LWP 5241) "simutrans-exten" 0x00007ffff7f9bc7e in pthread_barrier_wait () from /usr/lib/libpthread.so.0
  56   Thread 0x7fffd4ff9700 (LWP 5435) "simutrans-exten" 0x00007ffff7f9bc7e in pthread_barrier_wait () from /usr/lib/libpthread.so.0
  57   Thread 0x7fffd7fff700 (LWP 5436) "simutrans-exten" 0x00007ffff7f9bc7e in pthread_barrier_wait () from /usr/lib/libpthread.so.0
* 58   Thread 0x7ffff6b9f700 (LWP 5437) "simutrans-exten" 0x000055555563925d in vector_tpl<char const*>::resize (this=0x555555b509f0 <karte_t::random_callers>, new_size=16777216) at dataobj/../tpl/vector_tpl.h:67
  59   Thread 0x7fffd77fe700 (LWP 5438) "simutrans-exten" 0x00007ffff7f9bc7e in pthread_barrier_wait () from /usr/lib/libpthread.so.0
  60   Thread 0x7ffff73a0700 (LWP 5439) "simutrans-exten" 0x00007ffff7f9bc7e in pthread_barrier_wait () from /usr/lib/libpthread.so.0
  61   Thread 0x7ffff639e700 (LWP 5440) "simutrans-exten" 0x00007ffff7f9bc7e in pthread_barrier_wait () from /usr/lib/libpthread.so.0
  62   Thread 0x7ffff5b9d700 (LWP 5441) "simutrans-exten" 0x00007ffff7f9bc7e in pthread_barrier_wait () from /usr/lib/libpthread.so.0
  63   Thread 0x7ffff539c700 (LWP 5442) "simutrans-exten" 0x00007ffff7f9bc7e in pthread_barrier_wait () from /usr/lib/libpthread.so.0
  64   Thread 0x7ffff4b9b700 (LWP 5443) "simutrans-exten" 0x00007ffff7f9bc7e in pthread_barrier_wait () from /usr/lib/libpthread.so.0
  65   Thread 0x7fffd6ffd700 (LWP 5444) "simutrans-exten" 0x00007ffff7f9bc7e in pthread_barrier_wait () from /usr/lib/libpthread.so.0
  66   Thread 0x7fffd67fc700 (LWP 5445) "simutrans-exten" 0x00007ffff7f9bc7e in pthread_barrier_wait () from /usr/lib/libpthread.so.0
  67   Thread 0x7fffd5ffb700 (LWP 5446) "simutrans-exten" 0x00007ffff7f9bc7e in pthread_barrier_wait () from /usr/lib/libpthread.so.0
  68   Thread 0x7fffd57fa700 (LWP 5447) "simutrans-exten" 0x00007ffff7f9bc7e in pthread_barrier_wait () from /usr/lib/libpthread.so.0
  69   Thread 0x7fffbffff700 (LWP 5448) "simutrans-exten" 0x00007ffff7f9bc7e in pthread_barrier_wait () from /usr/lib/libpthread.so.0
  70   Thread 0x7fffbf7fe700 (LWP 5449) "simutrans-exten" 0x00007ffff7f9bc7e in pthread_barrier_wait () from /usr/lib/libpthread.so.0
  71   Thread 0x7fffbeffd700 (LWP 5450) "simutrans-exten" 0x00007ffff7f9bc7e in pthread_barrier_wait () from /usr/lib/libpthread.so.0

This is presumably caused by trying to write to random_callers while it is being resized. However, I'm slightly confused about exactly how this is being triggered, because the state once the program has stopped doesn't seem consistent with it having just triggered a segfault.


I'm also confused about the purpose of random_callers - it looks like it was added back in 2011 for debugging purposes, with the intent that it be used to output a list of calls to simrand folowing desyncs. However, I'm not sure whether it still does that, what precisely it does now, or if it is even necessary given that the simrand calls can already be output separately as they're run.

Perhaps random_callers should be removed from the code? Or does it still have some value that makes it worth fixing?

jamespetts

Thank you for looking into this. The random_callers code is only compiled when DEBUG_SIMRAND_CALLS is defined: it is debugging code, as you rightly infer, aimed at recording the order in which the RNG was called. It was written before multi-threading and was probably never modified to be thread safe. I have not used it for some time; it may be better to remove this if this is now known to have suffered bit-rot and causes crashes with multi-threading.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

ACarlotti

I now realise that what the code did is write a history of simrand calls to the ingame message window when the number of clients changed. I think this is less useful than outputting to a log anyway.

I've pushed a commit to remove this, and some other redundant debugging code that was specific to Newton Abbot station in some past game.

jamespetts

Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.