The International Simutrans Forum

 

Author Topic: Data race in objlist_t::remove when running headless server  (Read 718 times)

0 Members and 1 Guest are viewing this topic.

Offline ceeac

  • Devotee
  • *
  • Posts: 268
This bug is probably related to the desyncs related to convoys reported in the desync report thread.
The issue is that convoi_t::threaded_step and sync_step are not synchronized correctly which causes convoi_t::threaded_step to run during sync_step.

Code: [Select]
==================
WARNING: ThreadSanitizer: data race (pid=35805)
  Write of size 1 at 0x7bb0203a7659 by main thread:
    #0 objlist_t::remove(obj_t const*) /home/ceeac/Projects/code/simutrans-ex/dataobj/objlist.cc:506:7 (simutrans-extended+0x585e86)
    #1 grund_t::obj_remove(obj_t const*) /home/ceeac/Projects/code/simutrans-ex/vehicle/../boden/../boden/grund.h:597:54 (simutrans-extended+0xbac420)
    #2 vehicle_base_t::leave_tile() /home/ceeac/Projects/code/simutrans-ex/vehicle/simvehicle.cc:280:52 (simutrans-extended+0xbac420)
    #3 vehicle_t::leave_tile() /home/ceeac/Projects/code/simutrans-ex/vehicle/simvehicle.cc:1649:18 (simutrans-extended+0xbdf62a)
    #4 rail_vehicle_t::leave_tile() /home/ceeac/Projects/code/simutrans-ex/vehicle/simvehicle.cc:7735:13 (simutrans-extended+0xbdf62a)
    #5 vehicle_t::hop(grund_t*) /home/ceeac/Projects/code/simutrans-ex/vehicle/simvehicle.cc:1675:2 (simutrans-extended+0xbb4823)
    #6 vehicle_base_t::do_drive(unsigned int) /home/ceeac/Projects/code/simutrans-ex/vehicle/simvehicle.cc:375:4 (simutrans-extended+0xbacfb5)
    #7 convoi_t::sync_step(unsigned int) /home/ceeac/Projects/code/simutrans-ex/simconvoi.cc:1335:18 (simutrans-extended+0xa071c3)
    #8 karte_t::sync_list_t::sync_step(unsigned int) /home/ceeac/Projects/code/simutrans-ex/simworld.cc:4879:14 (simutrans-extended+0xb5cc68)
    #9 karte_t::sync_step(unsigned int, bool, bool) /home/ceeac/Projects/code/simutrans-ex/simworld.cc:4954:8 (simutrans-extended+0xb5cc68)
    #10 karte_t::interactive(unsigned int) /home/ceeac/Projects/code/simutrans-ex/simworld.cc:11395:6 (simutrans-extended+0xb7ece6)
    #11 simu_main(int, char**) /home/ceeac/Projects/code/simutrans-ex/simmain.cc:1606:9 (simutrans-extended+0xabfa91)
    #12 sysmain(int, char**) /home/ceeac/Projects/code/simutrans-ex/sys/simsys.cc:1097:9 (simutrans-extended+0xb89ff5)
    #13 main /home/ceeac/Projects/code/simutrans-ex/sys/simsys_posix.cc:195:9 (simutrans-extended+0xc41bc6)

  Previous read of size 1 at 0x7bb0203a7659 by thread T8:
    #0 objlist_t::bei(unsigned char) const /home/ceeac/Projects/code/simutrans-ex/boden/../dataobj/objlist.h:64:13 (simutrans-extended+0x55bd9c)
    #1 grund_t::obj_bei(unsigned char) const /home/ceeac/Projects/code/simutrans-ex/boden/grund.h:600:50 (simutrans-extended+0x55bd9c)
    #2 grund_t::get_weg_nr(int) const /home/ceeac/Projects/code/simutrans-ex/boden/grund.h:636:87 (simutrans-extended+0x55bd9c)
    #3 grund_t::get_weg(waytype_t) const /home/ceeac/Projects/code/simutrans-ex/boden/grund.h:643:24 (simutrans-extended+0x55bd9c)
    #4 grund_t::get_weg_ribi_unmasked(waytype_t) const /home/ceeac/Projects/code/simutrans-ex/boden/grund.cc:1820:15 (simutrans-extended+0x55bd9c)
    #5 grund_t::get_neighbour(grund_t*&, waytype_t, unsigned char) const /home/ceeac/Projects/code/simutrans-ex/boden/grund.cc:2772:39 (simutrans-extended+0x55b627)
    #6 route_t::intern_calc_route(karte_t*, koord3d, koord3d, test_driver_t*, int, long long, unsigned int, unsigned int, bool, int, koord3d, unsigned char, route_t::find_route_flags) /home/ceeac/Projects/code/simutrans-ex/dataobj/route.cc:902:19 (simutrans-extended+0x5b8589)
    #7 route_t::calc_route(karte_t*, koord3d, koord3d, test_driver_t*, int, unsigned int, bool, int, long long, unsigned int, koord3d, unsigned char, route_t::find_route_flags) /home/ceeac/Projects/code/simutrans-ex/dataobj/route.cc:1305:22 (simutrans-extended+0x5baa58)
    #8 rail_vehicle_t::calc_route(koord3d, koord3d, int, bool, route_t*) /home/ceeac/Projects/code/simutrans-ex/vehicle/simvehicle.cc:4720:37 (simutrans-extended+0xbd42de)
    #9 convoi_t::calc_route(koord3d, koord3d, int) /home/ceeac/Projects/code/simutrans-ex/simconvoi.cc:940:45 (simutrans-extended+0xa04c98)
    #10 convoi_t::drive_to() /home/ceeac/Projects/code/simutrans-ex/simconvoi.cc:1481:36 (simutrans-extended+0xa08b26)
    #11 convoi_t::threaded_step() /home/ceeac/Projects/code/simutrans-ex/simconvoi.cc:1742:3 (simutrans-extended+0xa0abda)
    #12 step_individual_convoy_threaded(void*) /home/ceeac/Projects/code/simutrans-ex/simworld.cc:1947:10 (simutrans-extended+0xb494d8)

  Location is heap block of size 32744 at 0x7bb0203a0000 allocated by main thread:
    #0 malloc <null> (simutrans-extended+0x439874)
    #1 xmalloc(unsigned long) /home/ceeac/Projects/code/simutrans-ex/simmem.cc:15:18 (simutrans-extended+0xac0768)
    #2 freelist_t::gimme_node(unsigned long) /home/ceeac/Projects/code/simutrans-ex/dataobj/freelist.cc:114:20 (simutrans-extended+0x5a7f64)
    #3 grund_t::operator new(unsigned long) /home/ceeac/Projects/code/simutrans-ex/boden/grund.cc:161:9 (simutrans-extended+0x550645)
    #4 planquadrat_t::rdwr(loadsave_t*, koord) /home/ceeac/Projects/code/simutrans-ex/simplan.cc:275:39 (simutrans-extended+0xad7bb1)
    #5 karte_t::load(loadsave_t*) /home/ceeac/Projects/code/simutrans-ex/simworld.cc:9660:33 (simutrans-extended+0xb7378d)
    #6 karte_t::load(char const*) /home/ceeac/Projects/code/simutrans-ex/simworld.cc:9147:3 (simutrans-extended+0xb7100f)
    #7 simu_main(int, char**) /home/ceeac/Projects/code/simutrans-ex/simmain.cc:1462:32 (simutrans-extended+0xabe924)
    #8 sysmain(int, char**) /home/ceeac/Projects/code/simutrans-ex/sys/simsys.cc:1097:9 (simutrans-extended+0xb89ff5)
    #9 main /home/ceeac/Projects/code/simutrans-ex/sys/simsys_posix.cc:195:9 (simutrans-extended+0xc41bc6)

  Thread T8 (tid=35866, running) created by main thread at:
    #0 pthread_create <null> (simutrans-extended+0x43b13b)
    #1 karte_t::init_threads() /home/ceeac/Projects/code/simutrans-ex/simworld.cc:2224:8 (simutrans-extended+0xb419cd)
    #2 karte_t::load(loadsave_t*) /home/ceeac/Projects/code/simutrans-ex/simworld.cc:10138:2 (simutrans-extended+0xb778b8)
    #3 karte_t::load(char const*) /home/ceeac/Projects/code/simutrans-ex/simworld.cc:9147:3 (simutrans-extended+0xb7100f)
    #4 simu_main(int, char**) /home/ceeac/Projects/code/simutrans-ex/simmain.cc:1462:32 (simutrans-extended+0xabe924)
    #5 sysmain(int, char**) /home/ceeac/Projects/code/simutrans-ex/sys/simsys.cc:1097:9 (simutrans-extended+0xb89ff5)
    #6 main /home/ceeac/Projects/code/simutrans-ex/sys/simsys_posix.cc:195:9 (simutrans-extended+0xc41bc6)

SUMMARY: ThreadSanitizer: data race /home/ceeac/Projects/code/simutrans-ex/dataobj/objlist.cc:506:7 in objlist_t::remove(obj_t const*)
==================

To reproduce, compile with clang/gcc without graphics and with Thread Sanitizer enabled (-fsanitize=thread), then load a save file from the command line.

Offline jamespetts

  • Simutrans-Extended project coordinator
  • Administrator
  • *
  • Posts: 20918
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: Data race in objlist_t::remove when running headless server
« Reply #1 on: July 03, 2021, 03:08:50 PM »
Interesting - thank you for this report. Do I understand that you mean that this data race can only be reproduced with the command line server build?

Offline prissi

  • Developer
  • Administrator
  • *
  • Posts: 10828
  • Languages: De,EN,JP
Re: Data race in objlist_t::remove when running headless server
« Reply #2 on: July 04, 2021, 11:41:40 AM »
It means it is a debug build which checks for this. Normal builds do not check it and just crash or desync. It cause quite slowdown and overhead:
https://clang.llvm.org/docs/ThreadSanitizer.html
So it may not work with the bridgewater save on most computers.

Offline ceeac

  • Devotee
  • *
  • Posts: 268
Re: Data race in objlist_t::remove when running headless server
« Reply #3 on: July 05, 2021, 05:54:17 PM »
Do I understand that you mean that this data race can only be reproduced with the command line server build?
The bug can be reproduced on any version, however with a non-graphical build no warnings originating from the graphics code are emitted which makes it much easier to look for issues potentially affecting client-server synchronization.

Offline Rollmaterial

  • Devotee
  • *
  • Posts: 636
  • OS: Windows 10
  • Languages: EN, FR, DE, FI, SE
Re: Data race in objlist_t::remove when running headless server
« Reply #4 on: July 08, 2021, 10:24:58 PM »
Does this cause corruption when removing tunnels? Asking to avoid potential duplicate bug reports.

Offline Matthew

  • Devotee
  • *
  • Posts: 614
    • Japan Railway Journal
  • Languages: EN, some ZH, DE & SQ
Re: Data race in objlist_t::remove when running headless server
« Reply #5 on: July 09, 2021, 01:02:27 PM »
Does this cause corruption when removing tunnels? Asking to avoid potential duplicate bug reports.

I seems unlikely, though this kind of thing is way out of my depth.

I tried building tunnels in the latest client's single-player mode and nothing strange happened, though that does not test for out-of-sync errors in network games.

Offline Rollmaterial

  • Devotee
  • *
  • Posts: 636
  • OS: Windows 10
  • Languages: EN, FR, DE, FI, SE
Re: Data race in objlist_t::remove when running headless server
« Reply #6 on: July 09, 2021, 02:52:26 PM »
What I'm describing doesn't seem to occur when removing the entire tunnel by using the general remove tool on a portal. It only occurs with the way removal tool.

Offline Matthew

  • Devotee
  • *
  • Posts: 614
    • Japan Railway Journal
  • Languages: EN, some ZH, DE & SQ
Re: Data race in objlist_t::remove when running headless server
« Reply #7 on: July 09, 2021, 03:57:10 PM »
What I'm describing doesn't seem to occur when removing the entire tunnel by using the general remove tool on a portal. It only occurs with the way removal tool.

I'd suggest you open a separate bug report with steps to reproduce. This bug report is about sync_step which I think (I mioght be wrong) handles routine movement of convoys, not way construction/destruction. I suggest you also skim-read these two bug reports. Also, please note whether the affected tunnels have Public right of way (PROW), because that could be important, since the PROW code was recently changed.

Offline Ranran

  • Devotee
  • *
  • Posts: 1660
  • 今日は兎汁よー
  • Languages: ja
Re: Data race in objlist_t::remove when running headless server
« Reply #8 on: July 09, 2021, 04:06:34 PM »
Does this cause corruption when removing tunnels? Asking to avoid potential duplicate bug reports.
This seems to be another bug. I haven't looked into it because I don't have time, but I'm guessing it's a merge retake error.

https://github.com/Ranran-the-JuicyPork/simutrans-extended/commit/c7fcb7b183fac385acedcc564267ac47368d8fcf
I don't know why the result of cherry picking this commit again was as this, but I'm guessing it was the cause.

Offline Matthew

  • Devotee
  • *
  • Posts: 614
    • Japan Railway Journal
  • Languages: EN, some ZH, DE & SQ
Re: Data race in objlist_t::remove when running headless server
« Reply #9 on: July 09, 2021, 07:10:53 PM »
This seems to be another bug. I haven't looked into it because I don't have time, but I'm guessing it's a merge retake error.

https://github.com/Ranran-the-JuicyPork/simutrans-extended/commit/c7fcb7b183fac385acedcc564267ac47368d8fcf
I don't know why the result of cherry picking this commit again was as this, but I'm guessing it was the cause.


That commit was announced in this post.