News:

Want to praise Simutrans?
Your feedback is important for us ;D.

New (and rather serious) 11.x bugs

Started by jamespetts, June 27, 2013, 10:19:38 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

jamespetts

I had hoped to be able to release a release candidate this evening, but I have ended up encountering some rather difficult bugs that have prevented this. So far:

(1) compiling the Linux server version without the "DEBUG = 3" flag in config.default means that no client can connect without almost instantly being desynchronised (I think that this might have been present in previous versions);
(2) compiling an optimised release build in Windows results in all vehicles being static, and reporting a maximum speed of 0km/h (tests show that this does not occur when the optimise flag is enabled in the Linux server build, and it does not occur in the Windows non-optimised debug build);
(3) I get the following crash on the Linux server build: this text filled the screen of my terminal client rather than being written to any  log despite logging being enabled:


*** Error in `/usr/share/games/simutrans-experimental/simutrans/simutrans-experimental': free(): invalid pointer: 0x000000003a3ba658 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x80a46)[0x7fc420877a46]
/usr/share/games/simutrans-experimental/simutrans/simutrans-experimental[0x4b0302]
/usr/share/games/simutrans-experimental/simutrans/simutrans-experimental[0x6e2b2c]
/usr/share/games/simutrans-experimental/simutrans/simutrans-experimental[0x6f9eb1]
/usr/share/games/simutrans-experimental/simutrans/simutrans-experimental[0x6f922e]
/usr/share/games/simutrans-experimental/simutrans/simutrans-experimental[0x489d91]
/usr/share/games/simutrans-experimental/simutrans/simutrans-experimental[0x7018a7]
/usr/share/games/simutrans-experimental/simutrans/simutrans-experimental[0x6a4aeb]
/usr/share/games/simutrans-experimental/simutrans/simutrans-experimental[0x6b478d]
/usr/share/games/simutrans-experimental/simutrans/simutrans-experimental[0x7572da]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7fc420818ea5]
/usr/share/games/simutrans-experimental/simutrans/simutrans-experimental[0x404349]
======= Memory map: ========
00400000-0085e000 r-xp 00000000 fd:03 400054                             /usr/share/games/simutrans-experimental/simutrans/simutrans-experimental
00a5e000-00a5f000 r--p 0045e000 fd:03 400054                             /usr/share/games/simutrans-experimental/simutrans/simutrans-experimental
00a5f000-00a63000 rw-p 0045f000 fd:03 400054                             /usr/share/games/simutrans-experimental/simutrans/simutrans-experimental
00a63000-00ba3000 rw-p 00000000 00:00 0
01bfa000-3a588000 rw-p 00000000 00:00 0                                  [heap]
7fc415d6e000-7fc417c34000 rw-p 00000000 00:00 0
7fc417c5d000-7fc419b4b000 rw-p 00000000 00:00 0
7fc419ff6000-7fc4207f7000 rw-p 00000000 00:00 0
7fc4207f7000-7fc4209b5000 r-xp 00000000 fd:03 2359583                    /lib/x86_64-linux-gnu/libc-2.17.so
7fc4209b5000-7fc420bb4000 ---p 001be000 fd:03 2359583                    /lib/x86_64-linux-gnu/libc-2.17.so
7fc420bb4000-7fc420bb8000 r--p 001bd000 fd:03 2359583                    /lib/x86_64-linux-gnu/libc-2.17.so
7fc420bb8000-7fc420bba000 rw-p 001c1000 fd:03 2359583                    /lib/x86_64-linux-gnu/libc-2.17.so
7fc420bba000-7fc420bbf000 rw-p 00000000 00:00 0
7fc420bbf000-7fc420bd3000 r-xp 00000000 fd:03 2359459                    /lib/x86_64-linux-gnu/libgcc_s.so.1
7fc420bd3000-7fc420dd3000 ---p 00014000 fd:03 2359459                    /lib/x86_64-linux-gnu/libgcc_s.so.1
7fc420dd3000-7fc420dd4000 r--p 00014000 fd:03 2359459                    /lib/x86_64-linux-gnu/libgcc_s.so.1
7fc420dd4000-7fc420dd5000 rw-p 00015000 fd:03 2359459                    /lib/x86_64-linux-gnu/libgcc_s.so.1
7fc420dd5000-7fc420ed8000 r-xp 00000000 fd:03 2359592                    /lib/x86_64-linux-gnu/libm-2.17.so
7fc420ed8000-7fc4210d8000 ---p 00103000 fd:03 2359592                    /lib/x86_64-linux-gnu/libm-2.17.so
7fc4210d8000-7fc4210d9000 r--p 00103000 fd:03 2359592                    /lib/x86_64-linux-gnu/libm-2.17.so
7fc4210d9000-7fc4210da000 rw-p 00104000 fd:03 2359592                    /lib/x86_64-linux-gnu/libm-2.17.so
7fc4210da000-7fc4211bf000 r-xp 00000000 fd:03 3016801                    /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.17
7fc4211bf000-7fc4213be000 ---p 000e5000 fd:03 3016801                    /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.17
7fc4213be000-7fc4213c6000 r--p 000e4000 fd:03 3016801                    /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.17
7fc4213c6000-7fc4213c8000 rw-p 000ec000 fd:03 3016801                    /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.17
7fc4213c8000-7fc4213dd000 rw-p 00000000 00:00 0
7fc4213dd000-7fc4213ec000 r-xp 00000000 fd:03 2359393                    /lib/x86_64-linux-gnu/libbz2.so.1.0.4
7fc4213ec000-7fc4215eb000 ---p 0000f000 fd:03 2359393                    /lib/x86_64-linux-gnu/libbz2.so.1.0.4
7fc4215eb000-7fc4215ec000 r--p 0000e000 fd:03 2359393                    /lib/x86_64-linux-gnu/libbz2.so.1.0.4
7fc4215ec000-7fc4215ed000 rw-p 0000f000 fd:03 2359393                    /lib/x86_64-linux-gnu/libbz2.so.1.0.4
7fc4215ed000-7fc421603000 r-xp 00000000 fd:03 2359350                    /lib/x86_64-linux-gnu/libz.so.1.2.7
7fc421603000-7fc421802000 ---p 00016000 fd:03 2359350                    /lib/x86_64-linux-gnu/libz.so.1.2.7
7fc421802000-7fc421803000 r--p 00015000 fd:03 2359350                    /lib/x86_64-linux-gnu/libz.so.1.2.7
7fc421803000-7fc421804000 rw-p 00016000 fd:03 2359350                    /lib/x86_64-linux-gnu/libz.so.1.2.7
7fc421804000-7fc421827000 r-xp 00000000 fd:03 2359560                    /lib/x86_64-linux-gnu/ld-2.17.so
7fc42182d000-7fc42184e000 rw-p 00000000 00:00 0
7fc42184e000-7fc421a14000 r--p 00000000 fd:03 3014692                    /usr/lib/locale/locale-archive
7fc421a14000-7fc421a1a000 rw-p 00000000 00:00 0
7fc421a22000-7fc421a26000 rw-p 00000000 00:00 0
7fc421a26000-7fc421a27000 r--p 00022000 fd:03 2359560                    /lib/x86_64-linux-gnu/ld-2.17.so
7fc421a27000-7fc421a29000 rw-p 00023000 fd:03 2359560                    /lib/x86_64-linux-gnu/ld-2.17.so
7fff20da4000-7fff20dfc000 rw-p 00000000 00:00 0                          [stack]
7fff20dfe000-7fff20e00000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]


(4) I get an access violation in line 1644 of output.c (a system file) on the Windows debug build, following repeated assertion failures. This line is ultimately triggered by line 722 of simworld (being


delete sync_list.back();


), which is called when a map is unloaded in preparation for loading another map, which happens every time that somebody joins a network game. This issue does not appear on my dingliste-cleanup branch.

I really do not know where to start with any of these, as they are all in entirely unfamiliar areas of the code, and I never quite understood how to go about debugging an issue that appears only on non-debug builds.

Edit: The above crashes seemed to occur specifically when I tried to connect the Windows release build.

Edit 2: I am wondering whether this is related to conditional compilation around "DEBUG", but I cannot find any code that appears to be suspicious that is connected with any such preprocessor directive.

Edit 3: Dr. Memory produced the following possibly relevant error on the Windows debug build:


Error #1: UNADDRESSABLE ACCESS: reading 0x0cae55b8-0x0cae55bc 4 byte(s)
# 0 quickstone_tpl<simline_t>::is_bound                [c:\users\james\documents\development\simutrans\simutrans-experimental-sources\tpl\quickstone_tpl.h:211]
# 1 convoi_t::set_schedule                             [c:\users\james\documents\development\simutrans\simutrans-experimental-sources\simconvoi.cc:2363]
# 2 depot_t::convoi_arrived                            [c:\users\james\documents\development\simutrans\simutrans-experimental-sources\simdepot.cc:167]
# 3 convoi_t::enter_depot                              [c:\users\james\documents\development\simutrans\simutrans-experimental-sources\simconvoi.cc:1912]
# 4 karte_t::load                                      [c:\users\james\documents\development\simutrans\simutrans-experimental-sources\simworld.cc:5760]
# 5 karte_t::load                                      [c:\users\james\documents\development\simutrans\simutrans-experimental-sources\simworld.cc:5268]
# 6 loadsave_frame_t::action                           [c:\users\james\documents\development\simutrans\simutrans-experimental-sources\gui\loadsave_frame.cc:77]
# 7 savegame_frame_t::action_triggered                 [c:\users\james\documents\development\simutrans\simutrans-experimental-sources\gui\savegame_frame.cc:375]
# 8 gui_action_creator_t::call_listeners               [c:\users\james\documents\development\simutrans\simutrans-experimental-sources\gui\components\gui_action_creator.h:36]
# 9 gui_table_t::infowin_event                         [c:\users\james\documents\development\simutrans\simutrans-experimental-sources\gui\components\gui_table.cc:146]
#10 gui_scrollpane_t::infowin_event                    [c:\users\james\documents\development\simutrans\simutrans-experimental-sources\gui\components\gui_scrollpane.cc:112]
#11 gui_container_t::infowin_event                     [c:\users\james\documents\development\simutrans\simutrans-experimental-sources\gui\gui_container.cc:183]
Note: @0:03:02.669 in thread 6936
Note: next higher malloc: 0x0cae6488-0x0cae755c
Note: instruction: cmp    (%edx,%ecx,4) $0x00000000


Edit 4: I have tried compiling an optimised Windows release build with DEBUG=1 defined (generally, it does not have DEBUG defined at all), and this works without difficulties. The issue is certainly something in the preprocessor directives with DEBUG.

Edit 5: I think that I might have fixed this in my latest commit on the 11.x branch. I am not quite sure how this bug got in in the first place, however.

Edit 6: Not fixed after all - when the Linux server version is compiled with no DEBUG defined, it still desyncs almost as soon as a client connects.

Edit 7: Compiling with DEBUG=1 does not solve this issue.

neroden

Grrrr.  Invalid free errors.

This indicates code which is making invalid assumptions.

The various compilers are making optimizations which they are allowed to do under the C++ standard, but the code is violating those assumptions.  The classic way for this to happen is for functions which are supposed to be const to not be const, but there are a lot of other similar ways to "lie to the compiler".

There are a lot of possibilities.  For instance, quickstone_tpl violates memory management rules by reusing old handles.  (Ouch ouch...)  That's fairly unlikely. 

There were all kinds of things wrong with dingliste_t, and there are still problems with it, but I haven't been able to generate the errors *reliably* after rewriting it (so I managed to eliminate some of them, I guess).  I've still been getting rare null pointer errors (now, after rewriting dingliste_t, they show up as assertions).  I have tried to track down the source of the errors with little success...

At this point, I strongly suggest rebasing the release candidate on the 112.private-car-merge branch, or even on my ncn-devel branch.  Both seem to be compiling better. We've done too much internal cleanup to trust the 11.x branch.

The behavior of the DEBUG flag is kind of complicated.  On Linux:

-- At DEBUG==0, NDEBUG is activated, which disables all assertions.
-- At DEBUG==1, assertions are active and DEBUG is defined (tested by some #ifdefs)
-- At DEBUG==2, "-fno-inline" is defined, so nothing is inlined.
-- At DEBUG==3, "-O0" is activated and no optimization at all is done.
-----

OK, first step: try to solve the problem which occurs on the Linux server when DEBUG=1 but not when DEBUG=3.  Test with DEBUG=2.

jamespetts

Thank you for your input on this. The server with DEBUG=3 set seems to work (as is currently running), but this is obviously less efficient than without DEBUG defined at all.

Incidentally, what (if anything) is the difference between not having a DEBUG=x line in the config.default file (for example, by commenting it out) and having DEBUG=0? Also, you write that the DEBUG level affects optimisations - what is the effect of having DEBUG=3 when OPTIMISE=1 is set in config.default? Likewise, what if one sets DEBUG=1 (or > 1) if NDEBUG is also defined?

Incidentally, the release candidate has already been released based on the 11.x branch, which, with the server compiled with DEBUG=3, is working fine (although I still cannot get the Windows command line server to work - but I could not get this to work on 112.x-private-car-merge, either). Indeed, the unfortunate behaviour with DEBUG undefined has not been confirmed to be exclusive to the dingliste-cleanup branch and its derivatives.

Edit: Perhaps I spoke (wrote) too soon: there has been a crash/revert reported on the Bridgewater-Brunel server, although I do not yet have sufficient information to be able to reproduce it or know whether it is reproducible.

prissi

The simutrans standard had an assert which had a call to a function which changed the gamestate (unfourtunately I forgot which). This is most likely the reason for desync.

jamespetts

Thank you for that. You wrote that it "had" such an assert - has it been removed in Standard now?

prissi

The last time I teste (which is quite log ago) it was ok for debug or optimized builds. But the server rauns always a debug, so I can get more errors out of it in case it crashes.

jamespetts

Ahh, I see - so this is untested in Standard? Interesting.

prissi

No, it was solved in standard. But it was a persistent bug at the beginning.

jamespetts