The International Simutrans Forum

 

Author Topic: [10.5, 10.6 bug] double free error  (Read 10198 times)

0 Members and 1 Guest are viewing this topic.

Offline jk271

  • *
  • Posts: 292
  • Languages: CZ, EN, DE
[10.5, 10.6 bug] double free error
« on: January 17, 2012, 11:23:02 PM »

I have discovered something, that can be double free error in simutrans experimental. It happens when I am closing game.


My computer: 32-bit
My OS: GNU/Linux Debian, 32-bit
simutrans-experimental version:
- - - - - - 10.5, official linux release;
- - - - - - and latest compiled myself (git revision 40508784... CHANGE: Some simu...)
- - - - - - (bug appears in both versions)
pakset: pak128.britain-exp-0.8.3


How to reproduce this bug:
1) Open simutrans experimental
2) select pak Britain 128 experimental
3) click on "quit"
4) double free occurs


When I tried this with pak64.experimental or pak64.german experimental, it worked well. Only with pak128.Britain exp it crashed.


There is copy of text from my command line. Backtrace in gdb can be seen in the bottom.


Show banner ...
World destroyed.
World destroyed.
[Thread 0xb73ffb70 (LWP 29561) exited]
*** glibc detected *** /home/pokus/simutrans/simutrans-exp-2012-01-09-bb998c1: double free or corruption (!prev): 0x084b0f90 ***
======= Backtrace: =========
/lib/i686/cmov/libc.so.6(+0x6b281)[0xb7cd3281]
/lib/i686/cmov/libc.so.6(+0x6cad8)[0xb7cd4ad8]
/lib/i686/cmov/libc.so.6(cfree+0x6d)[0xb7cd7bbd]
/usr/lib/libstdc++.so.6(_ZdlPv+0x21)[0xb7ead701]
/usr/lib/libstdc++.so.6(_ZdaPv+0x1d)[0xb7ead75d]
/home/pokus/simutrans/simutrans-exp-2012-01-09-bb998c1[0x809e28d]
/lib/i686/cmov/libc.so.6(+0x2f2bf)[0xb7c972bf]
/lib/i686/cmov/libc.so.6(+0x2f32f)[0xb7c9732f]
/lib/i686/cmov/libc.so.6(__libc_start_main+0xee)[0xb7c7ec7e]
/home/pokus/simutrans/simutrans-exp-2012-01-09-bb998c1(_ZNSt8ios_base4InitD1Ev+0x61)[0x804b551]
======= Memory map: ========
08048000-08374000 r-xp 00000000 fe:01 73588      /home/pokus/simutrans/simutrans-exp-2012-01-09-bb998c1
08374000-08376000 rw-p 0032c000 fe:01 73588      /home/pokus/simutrans/simutrans-exp-2012-01-09-bb998c1
08376000-0dcfb000 rw-p 00000000 00:00 0          [heap]
b57e3000-b6af6000 rw-p 00000000 00:00 0
b6bff000-b6c00000 ---p 00000000 00:00 0
b6c00000-b7400000 rwxp 00000000 00:00 0
b7500000-b7521000 rw-p 00000000 00:00 0
b7521000-b7600000 ---p 00000000 00:00 0
b76d2000-b76dc000 r-xp 00000000 fe:00 816779     /lib/i686/cmov/libnss_files-2.11.2.so
b76dc000-b76dd000 r--p 00009000 fe:00 816779     /lib/i686/cmov/libnss_files-2.11.2.so
b76dd000-b76de000 rw-p 0000a000 fe:00 816779     /lib/i686/cmov/libnss_files-2.11.2.so
b76de000-b76f1000 r-xp 00000000 fe:00 816780     /lib/i686/cmov/libnsl-2.11.2.so
b76f1000-b76f2000 r--p 00012000 fe:00 816780     /lib/i686/cmov/libnsl-2.11.2.so
b76f2000-b76f3000 rw-p 00013000 fe:00 816780     /lib/i686/cmov/libnsl-2.11.2.so
b76f3000-b76f5000 rw-p 00000000 00:00 0
b76f5000-b76fd000 r-xp 00000000 fe:00 179603     /usr/lib/libXcursor.so.1.0.2
b76fd000-b76fe000 rw-p 00007000 fe:00 179603     /usr/lib/libXcursor.so.1.0.2
b7721000-b7921000 r--p 00000000 fe:00 292626     /usr/lib/locale/locale-archive
b7928000-b7930000 r-xp 00000000 fe:00 179361     /usr/lib/libXrender.so.1.3.0
b7930000-b7931000 rw-p 00007000 fe:00 179361     /usr/lib/libXrender.so.1.3.0
b7940000-b7a59000 r-xp 00000000 fe:00 178821     /usr/lib/libX11.so.6.3.0
b7a59000-b7a5d000 rw-p 00118000 fe:00 178821     /usr/lib/libX11.so.6.3.0
b7a62000-b7a6a000 r-xp 00000000 fe:00 816188     /lib/i686/cmov/libnss_nis-2.11.2.so
b7a6a000-b7a6b000 r--p 00008000 fe:00 816188     /lib/i686/cmov/libnss_nis-2.11.2.so
b7a6b000-b7a6c000 rw-p 00009000 fe:00 816188     /lib/i686/cmov/libnss_nis-2.11.2.so
b7a6c000-b7a72000 r-xp 00000000 fe:00 813142     /lib/i686/cmov/libnss_compat-2.11.2.so
b7a72000-b7a73000 r--p 00006000 fe:00 813142     /lib/i686/cmov/libnss_compat-2.11.2.so
b7a73000-b7a74000 rw-p 00007000 fe:00 813142     /lib/i686/cmov/libnss_compat-2.11.2.so
b7a74000-b7a7b000 r--s 00000000 fe:00 183941     /usr/lib/gconv/gconv-modules.cache
b7a7b000-b7a7f000 r-xp 00000000 fe:00 178828     /usr/lib/libXfixes.so.3.1.0
b7a7f000-b7a80000 rw-p 00003000 fe:00 178828     /usr/lib/libXfixes.so.3.1.0
b7a80000-b7a82000 rw-p 00000000 00:00 0
b7a82000-b7a84000 r-xp 00000000 fe:00 812916     /lib/libx86.so.1
b7a84000-b7a85000 rw-p 00001000 fe:00 812916     /lib/libx86.so.1
b7a85000-b7ad6000 r-xp 00000000 fe:00 186426     /usr/lib/libvga.so.1.4.3
b7ad6000-b7add000 rw-p 00050000 fe:00 186426     /usr/lib/libvga.so.1.4.3
b7add000-b7ae6000 rw-p 00000000 00:00 0
b7ae6000-b7af9000 r-xp 00000000 fe:00 179836     /usr/lib/libdirect-1.2.so.9.0.1
b7af9000-b7afa000 rw-p 00013000 fe:00 179836     /usr/lib/libdirect-1.2.so.9.0.1
b7afa000-b7afb000 rw-p 00000000 00:00 0
b7afb000-b7b03000 r-xp 00000000 fe:00 179834     /usr/lib/libfusion-1.2.so.9.0.1
b7b03000-b7b04000 rw-p 00007000 fe:00 179834     /usr/lib/libfusion-1.2.so.9.0.1
b7b04000-b7b76000 r-xp 00000000 fe:00 179837     /usr/lib/libdirectfb-1.2.so.9.0.1
b7b76000-b7b79000 rw-p 00071000 fe:00 179837     /usr/lib/libdirectfb-1.2.so.9.0.1
b7b79000-b7b7b000 r-xp 00000000 fe:00 816755     /lib/i686/cmov/libdl-2.11.2.so
b7b7b000-b7b7c000 r--p 00001000 fe:00 816755     /lib/i686/cmov/libdl-2.11.2.so
b7b7c000-b7b7d000 rw-p 00002000 fe:00 816755     /lib/i686/cmov/libdl-2.11.2.so
b7b7d000-b7b84000 r-xp 00000000 fe:00 816200     /lib/i686/cmov/librt-2.11.2.so
b7b84000-b7b85000 r--p 00006000 fe:00 816200     /lib/i686/cmov/librt-2.11.2.so
b7b85000-b7b86000 rw-p 00007000 fe:00 816200     /lib/i686/cmov/librt-2.11.2.so
b7b86000-b7c4a000 r-xp 00000000 fe:00 1416103    /usr/lib/libasound.so.2.0.0
b7c4a000-b7c4e000 rw-p 000c4000 fe:00 1416103    /usr/lib/libasound.so.2.0.0
b7c4e000-b7c63000 r-xp 00000000 fe:00 816196     /lib/i686/cmov/libpthread-2.11.2.so
b7c63000-b7c64000 r--p 00014000 fe:00 816196     /lib/i686/cmov/libpthread-2.11.2.so
b7c64000-b7c65000 rw-p 00015000 fe:00 816196     /lib/i686/cmov/libpthread-2.11.2.so
b7c65000-b7c68000 rw-p 00000000 00:00 0
b7c68000-b7da8000 r-xp 00000000 fe:00 816781     /lib/i686/cmov/libc-2.11.2.so
b7da8000-b7daa000 r--p 0013f000 fe:00 816781     /lib/i686/cmov/libc-2.11.2.so
b7daa000-b7dab000 rw-p 00141000 fe:00 816781     /lib/i686/cmov/libc-2.11.2.so
b7dab000-b7dae000 rw-p 00000000 00:00 0
b7dae000-b7dcb000 r-xp 00000000 fe:00 812987     /lib/libgcc_s.so.1
b7dcb000-b7dcc000 rw-p 0001c000 fe:00 812987     /lib/libgcc_s.so.1
b7dcc000-b7df0000 r-xp 00000000 fe:00 813230     /lib/i686/cmov/libm-2.11.2.so
b7df0000-b7df1000 r--p 00023000 fe:00 813230     /lib/i686/cmov/libm-2.11.2.so
b7df1000-b7df2000 rw-p 00024000 fe:00 813230     /lib/i686/cmov/libm-2.11.2.so
b7df2000-b7edb000 r-xp 00000000 fe:00 183730     /usr/lib/libstdc++.so.6.0.13
b7edb000-b7edf000 r--p 000e9000 fe:00 183730     /usr/lib/libstdc++.so.6.0.13
b7edf000-b7ee0000 rw-p 000ed000 fe:00 183730     /usr/lib/libstdc++.so.6.0.13
b7ee0000-b7ee7000 rw-p 00000000 00:00 0
b7ee7000-b7f4d000 r-xp 00000000 fe:00 178965     /usr/lib/libSDL-1.2.so.0.11.3
b7f4d000-b7f4f000 rw-p 00065000 fe:00 178965     /usr/lib/libSDL-1.2.so.0.11.3
b7f4f000-b7f98000 rw-p 00000000 00:00 0
b7f98000-b7fa8000 r-xp 00000000 fe:00 814855     /lib/libbz2.so.1.0.4
b7fa8000-b7fa9000 rw-p 00010000 fe:00 814855     /lib/libbz2.so.1.0.4
b7fa9000-b7fbc000 r-xp 00000000 fe:00 183161     /usr/lib/libz.so.1.2.3.4
b7fbc000-b7fbd000 rw-p 00013000 fe:00 183161     /usr/lib/libz.so.1.2.3.4
b7fbe000-b7fc2000 r-xp 00000000 fe:00 184882     /usr/lib/libXdmcp.so.6.0.0
b7fc2000-b7fc3000 rw-p 00003000 fe:00 184882     /usr/lib/libXdmcp.so.6.0.0
b7fc3000-b7fc5000 r-xp 00000000 fe:00 183537     /usr/lib/libXau.so.6.0.0
b7fc5000-b7fc6000 rw-p 00001000 fe:00 183537     /usr/lib/libXau.so.6.0.0
b7fc6000-b7fde000 r-xp 00000000 fe:00 178823     /usr/lib/libxcb.so.1.1.0
b7fde000-b7fdf000 rw-p 00017000 fe:00 178823     /usr/lib/libxcb.so.1.1.0
b7fdf000-b7fe2000 rw-p 00000000 00:00 0
Program received signal SIGABRT, Aborted.
0xb7fe2424 in __kernel_vsyscall ()
(gdb) bt
#0  0xb7fe2424 in __kernel_vsyscall ()
#1  0xb7c92751 in *__GI_raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#2  0xb7c95b82 in *__GI_abort () at abort.c:92
#3  0xb7cc918d in __libc_message (do_abort=2, fmt=0xb7d8d738 "*** glibc detected *** %s: %s: 0x%s ***\n")
    at ../sysdeps/unix/sysv/linux/libc_fatal.c:189
#4  0xb7cd3281 in malloc_printerr (action=<value optimized out>, str=0x6 <Address 0x6 out of bounds>, ptr=0x84b0f90)
    at malloc.c:6267
#5  0xb7cd4ad8 in _int_free (av=<value optimized out>, p=<value optimized out>) at malloc.c:4795
#6  0xb7cd7bbd in *__GI___libc_free (mem=0x84b0f90) at malloc.c:3739
#7  0xb7ead701 in operator delete(void*) () from /usr/lib/libstdc++.so.6
#8  0xb7ead75d in operator delete[](void*) () from /usr/lib/libstdc++.so.6
#9  0x0809e28d in settings_t::~settings_t() ()
#10 0xb7c972bf in __run_exit_handlers (status=0, listp=0xb7daa304, run_list_atexit=true) at exit.c:78
#11 0xb7c9732f in *__GI_exit (status=0) at exit.c:100
#12 0xb7c7ec7e in __libc_start_main (main=0x8313f10 <main>, argc=1, ubp_av=0xbffff514, init=0x8315110 <__libc_csu_init>,
    fini=0x8315100 <__libc_csu_fini>, rtld_fini=0xb7ff1040 <_dl_fini>, stack_end=0xbffff50c) at libc-start.c:260
#13 0x0804b551 in _start ()
(gdb)


Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Devotee
  • *
  • Posts: 18753
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: [10.5, 10.6 bug] double free error
« Reply #1 on: January 17, 2012, 11:31:56 PM »
Thank you for the report. This has been reported before, and is associated with the liveries. Since it is non-critical (it only occurs on exit), and I have found it hard to trace and fix, I have not prioritised finding a fix for it. If you or anyone else can suggest a fix, however, I should be most grateful.

Offline jk271

  • *
  • Posts: 292
  • Languages: CZ, EN, DE
Re: [10.5, 10.6 bug] double free error
« Reply #2 on: January 17, 2012, 11:43:53 PM »
I have not known it was posted before. Sorry for double posting.

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Devotee
  • *
  • Posts: 18753
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: [10.5, 10.6 bug] double free error
« Reply #3 on: January 17, 2012, 11:52:51 PM »
Don't worry - easy to miss. If you or anyone has any ideas for fixing this, they would be most welcome!

Offline sdog

  • Devotee
  • *
  • Posts: 2039
[10.x] game doesn't close properly
« Reply #4 on: February 05, 2012, 02:54:04 AM »

Code: [Select]
Program received signal SIGABRT, Aborted.
0x00007ffff6c133a5 in raise () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0  0x00007ffff6c133a5 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007ffff6c16b0b in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007ffff6c4b113 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#3  0x00007ffff6c55a96 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#4  0x00007ffff6c59d7c in free () from /lib/x86_64-linux-gnu/libc.so.6
#5  0x000000000045344d in settings_t::~settings_t() ()
#6  0x00007ffff6c18821 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#7  0x00007ffff6c188a5 in exit () from /lib/x86_64-linux-gnu/libc.so.6
#8  0x00007ffff6bfe314 in __libc_start_main ()
   from /lib/x86_64-linux-gnu/libc.so.6
#9  0x000000000040938d in _start ()
compiled and tested at 64 ubuntu 11.10, gcc 4.6.1 from 10.x branch

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Devotee
  • *
  • Posts: 18753
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: [10.5, 10.6 bug] double free error
« Reply #5 on: February 05, 2012, 12:54:05 PM »
Topics merged - see above for answer.

Offline ras52

  • *
  • Posts: 83
Re: [10.5, 10.6 bug] double free error
« Reply #6 on: February 10, 2012, 02:44:59 PM »
The bug is actually a very serious memory corruption problem that's present on both Standard and Experimental, and probably has been for years.  It's pure chance that it is currently only affecting the livery code as the bug is nowhere near that.  The problem is in the vector_tpl template.  That class has a hand-written destructor and a hand-written copy constructor, but uses the compiler-generated copy assignment operator.  It is almost always a mistake to implement two of these without the third, and this is no exception.

I have fixed this in my local copy of the vector template and have verified that the double free bug goes away.  Almost all of the other templates in the tpl/ directory suffer from similar problems which I have not investigated in detail.

However, I really do think that it is a mistake to even have templates for vector, list, etc. in the code.  Such classes are hard to get right, and even harder to get right while being efficient.  There is a perfectly good ones in the C++ standard library and it does almost everything that we want.   The standard library one will typically be more efficient -- certainly it won't be less efficient -- and you can bet good money on it not having bugs of this nature in it.  Let's not reinvent the wheel. 

I would be happy to work on a patch to remove the vector_tpl template and replace it with the standard vector class.

Edit: Patch to fix vector_tpl.h attached.
« Last Edit: February 10, 2012, 02:50:37 PM by ras52 »

Offline Dwachs

  • DevTeam, Coder/patcher
  • Administrator
  • *
  • Posts: 4607
  • Languages: EN, DE, AT
Re: [10.5, 10.6 bug] double free error
« Reply #7 on: February 10, 2012, 02:53:54 PM »
The bug is actually a very serious memory corruption problem that's present on both Standard and Experimental, and probably has been for years.
Hm, I regularly run valgrind on simutrans, which never reported any double frees ???

Why not fixing the vector_tpl by adding the missing third directly?

Offline ras52

  • *
  • Posts: 83
Re: [10.5, 10.6 bug] double free error
« Reply #8 on: February 10, 2012, 03:00:18 PM »
Hm, I regularly run valgrind on simutrans, which never reported any double frees ???

It's more likely to result in memory leaks than in double destructions, though both can happen.

Why not fixing the vector_tpl by adding the missing third directly?

That's what the patch I attached does.  But it doesn't change the fact that I think it's a mistake for it to even be present; nor does it change the fact that I have just verified that there are similar memory management problems with the list, minivec, prioqueue, sparse templates, and probably most of the others.  (Interestingly the slist one is okay.)  Nor does it change the fact that it's very hard to get these sorts of class right and there could well still be bugs left that I haven't spotted.  And nor does it change the fact that there is a much better implementation in the standard library.
« Last Edit: February 10, 2012, 11:15:43 PM by ras52 »

Offline prissi

  • Developer
  • Administrator
  • *
  • Posts: 9584
  • Languages: De,EN,JP
Re: [10.5, 10.6 bug] double free error
« Reply #9 on: February 10, 2012, 10:27:26 PM »
The list in std are inefficient and bloated, as they are (at least to my knowledge) have two pointer per element. Together with the custom memory allocation, slist_tpl is much much faster.

Actually, why is slist_tpl save? It will be copied with the default constructor, or? Especially socket_info_t seems wrong.

quickstone_t is also save.

Offline ras52

  • *
  • Posts: 83
Re: [10.5, 10.6 bug] double free error
« Reply #10 on: February 10, 2012, 11:05:34 PM »
The list in std are inefficient and bloated, as they are (at least to my knowledge) have two pointer per element. Together with the custom memory allocation, slist_tpl is much much faster.

In what way is it faster?  I've just profiled slist_t in three different ways and found it slower than std::list by 20% in the first test, and by a factor of 3.3 in the second.  I can't tell you how fast it was in the third test because it hung, probably due to a similar rmemory allocation issue to the one I've already identified in vector_tpl.

Actually, why is slist_tpl save? It will be copied with the default constructor, or? Especially socket_info_t seems wrong.

quickstone_t is also save.

I'm sorry, but I don't understand what you're saying here.  You're quite right: slist_tpl has the same problem too.  I was mistaken in my earlier post.
« Last Edit: February 10, 2012, 11:15:09 PM by ras52 »

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Devotee
  • *
  • Posts: 18753
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: [10.5, 10.6 bug] double free error
« Reply #11 on: February 10, 2012, 11:27:03 PM »
This is an interesting discussion. Richard - thank you very much for spotting this issue and fixing it. I am staying with my parents this week-end, so shall not have time to apply your patch until next week, but it is much appreciated.

As to the more general issue. your profile results are very interesting indeed. Does the same hold for the other classes, do we think?

(Incidentally, in fairness to Hajo who wrote this all in the first place, when he started work on the project, it was 1997, and, if my memory serves me correctly, the STD library was not then well established).

Offline ras52

  • *
  • Posts: 83
Re: [10.5, 10.6 bug] double free error
« Reply #12 on: February 10, 2012, 11:55:35 PM »
This is an interesting discussion. Richard - thank you very much for spotting this issue and fixing it. I am staying with my parents this week-end, so shall not have time to apply your patch until next week, but it is much appreciated.

No hurry.  It's only currently exhibiting during shutdown, so it's hardly urgent.

As to the more general issue. your profile results are very interesting indeed. Does the same hold for the other classes, do we think?

We should be cautious how we interpret profiling results.  They're useful in that they give you numbers that you can compare, but they just measure one specific combination of operations.  Another test written in a different way, testing a different sequence of operations might give a rather different result.  In real world code, things are never as clear cut and there's a danger that you end up optimising the code for your test suite while accidentally pessimising the cases in actual use. 

That said, I believe that a modern C++ standard library (e.g. one from a recent GCC or MSVC) will be at least as efficient as our code and in many case more efficient.  After all, a vast amount of development effort has gone into their libraries, and we can't hope to compete with that except perhaps in a few special cases where we have very particular and well-understood requirements.  I've just tried the vector class with a simple test doing

  for (int i=0; i<1000000; ++i) {
    VECTOR<sint32> vec;
    for (int j=0; j<100; ++j)
      vec.push_back(42);
    VECTOR<sint32> vec2( vec );
  }

I find it is consistently 10-15% faster when VECTOR was #defined to be std::vector than when it was our vector_tpl class, and unlike the list class, the vector class is quite a simple one without much scope for improvements over what we have.  (Actually, the std::vector class is smart enough to know when it can do a memcpy and when it must copy the array element by element, which our class doesn't do: this may explain some of that difference.)
 
(Incidentally, in fairness to Hajo who wrote this all in the first place, when he started work on the project, it was 1997, and, if my memory serves me correctly, the STD library was not then well established).

Quite so.  I certainly can't criticise the original decision not to use it.  I'm simply arguing that the original reasons no longer hold.  In 1997, MSVC 4.2 was the newest Microsoft compiler, and my overriding memory of it was that its STL barely worked; the official GCC wasn't in a much better position as I recall, though there was a fork called EGCS that did have a half-decent STL. Nor had the C++ standard been published.

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Devotee
  • *
  • Posts: 18753
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: [10.5, 10.6 bug] double free error
« Reply #13 on: February 11, 2012, 12:03:15 AM »
A fascinating mini-history of the C++ STL! Do the names of the relevant methods in the STL match those in the Simutrans classes? If so, it would be quite easy to substitute one for the other. If not, I'd be wary of introducing changes here that make Experimental more different architecturally than necessary than Standard.

Another thing to do, especially if the method names are the same (or perhaps can be aliased somehow...?), would be to profile a large Simutrans-Experimental game (perhaps the one from the current server?) with the Simutrans templates and the STL templates and see which runs the faster - that would overcome, perhaps, the issue with simple test profiling.

Offline prissi

  • Developer
  • Administrator
  • *
  • Posts: 9584
  • Languages: De,EN,JP
Re: [10.5, 10.6 bug] double free error
« Reply #14 on: February 11, 2012, 12:51:27 AM »
STL was not really existent until 2001 and is still not fully supported on some platforms.

The last time I did this profiling I was allocating lots of objects. slist_tpl has a much faster allocation that std:: due to its use of memory pools. However, your mileage may vary strong based on implementation and compiler.

Nevertheless the copying may case real errors, as it is done with packets in socket_info_t, where real issues may evolve.

Offline ras52

  • *
  • Posts: 83
Re: [10.5, 10.6 bug] double free error
« Reply #15 on: February 11, 2012, 02:03:20 AM »
STL was not really existent until 2001 and is still not fully supported on some platforms.

I'm sorry, but that's simply not true.  The STL was first implemented by HP an by 1994 is was considered mature enough to be accepted for inclusion in the draft C++ standard.  If you don't believe me, you can read Stepanov's and Lee's 1994 paper proposing its inclusion.  The release notes for EGCS 1.0.0 make it clear that it had STL support.  Debian 'hamm' adopted EGCS in place of the offical GNU GCC branch in 1997, so had a largely working STL at that point.  My recollection is that RedHat (the biggest Linux distro in the late 90s) did similarly; certainly RedHat 'cartman' (1999) had EGCS with a working STL.  By the time Microsoft released MSVC 5.0 in late '97 or early '98, the STL was usable, and indeed I still support one piece of code on that compiler.  Borland had a working STL in the late '90s, and I would be surprised if Intel didn't too.  The sort of functionality we would be using -- vectors, lists, and so on -- has been working adequately well since about '98.

As to it still not being fully supported on some platforms, which platforms are they?  Various old posts in these forums to the contrary say we target gcc 2.95, but I assume that's no longer true as I've just tried it and both Exp and Std are some way from being able to compile with it.  (It's possible that the BeOS or Haiku use a patched gcc 2.95 that does compile the code, but unless we suppose they also patched gcc 2.95 to break the STL, the gcc 2.95 STL is good enough for our purposes.)  What other platforms are we targeting that don't support the STL?  I can't think of any.  Modern mobile SDKs tend to have good C++ support.

he install meg lots of objects. slist_tpl has a much faster allocation that std:: due to its use of memory pools. However, your mileage may vary strong based on implementation and compiler.

If that's a concern, let's use a pool allocator with std::list.  It's explicitly designed to work with different user-supplied allocators.  But I'm not convinced it's necessary.

But efficiency isn't the primary reason for wanting to replace our templates.  It's that they're hard to get right and are not currently right.  I haven't even mentioned exception safety because we rarely use exceptions (though there are examples), but our templates have major problems with exception safety too.  Even if we manage to get them right, someone will add some new functionality to them and again break them.  Of course, these problems probably exist in some other parts of our code too, but the templates is one of the few bits where a better alternative exists.  Imagine for a moment that we had a home-grown libbz in the code, instead of using the system library.  Wouldn't we want to rip that out and replace it with the system libbz?

Offline Ters

  • Coder/patcher
  • Devotee
  • *
  • Posts: 5555
  • Languages: EN, NO
Re: [10.5, 10.6 bug] double free error
« Reply #16 on: February 11, 2012, 09:15:16 AM »
I've seen dozens of posts on programming sites pointing out that MSVC didn't support STL (or pretty much C++98) properly before MSVC 7 (aka 2002). Many have flat out refused to help people who use MSVC 6 due to this. So I don't think prissi was saying that STL had not been invented before 2001, but that one could not expect code using STL to compile and work properly before then with the at the time most popular compilers.

Offline ras52

  • *
  • Posts: 83
Re: [10.5, 10.6 bug] double free error
« Reply #17 on: February 11, 2012, 12:24:03 PM »
I've seen dozens of posts on programming sites pointing out that MSVC didn't support STL (or pretty much C++98) properly before MSVC 7 (aka 2002). Many have flat out refused to help people who use MSVC 6 due to this. So I don't think prissi was saying that STL had not been invented before 2001, but that one could not expect code using STL to compile and work properly before then with the at the time most popular compilers.

I think you're thinking of MSVC 7.1 (in 2003) not 7.0 which was a something of a disappointment from a C++ point of view.  The problem with the earlier MSVCs wasn't that the STL didn't work, it was that the compiler had incomplete support for templates.  Simple things worked; complex things wouldn't compile, though if they compiled they would work fine.  That applied whether you're using your own templates or the ones in the STL.  In many ways it was easier to use the STL with MSVC 6.0 than write your own, because the STL had been modified so that it would work as best it could given the available compiler support.  As I've already mentioned, I still support a 50k line program I wrote about ten years ago and still needs to compile on compilers back to MSVC 5.  It uses the STL extensively without significant difficulty.  That said, I'm not surprised you can find lots of instances of people refusing to help people compile on MSVC 6.  In most circumstances I would too.  The compiler is thirteen years old.  I dare say if I came here moaning that simutrans didn't compile on gcc 2.95 (which it doesn't), I'd get a similar reaction unless I could demonstrate a pressing requirement for it to compile with that.

Offline Dwachs

  • DevTeam, Coder/patcher
  • Administrator
  • *
  • Posts: 4607
  • Languages: EN, DE, AT
Re: [10.5, 10.6 bug] double free error
« Reply #18 on: February 11, 2012, 03:51:37 PM »
Could you check back with r5273 ? I tried to declare all the missing constructors / assignment operators as private.

Offline ras52

  • *
  • Posts: 83
Re: [10.5, 10.6 bug] double free error
« Reply #19 on: February 11, 2012, 07:05:11 PM »
Could you check back with r5273 ? I tried to declare all the missing constructors / assignment operators as private.

That looks exactly right to me.  Thanks for fixing it.

r5255 on the other hand looks problematic.  In vector_tpl it uses the assignment operator I wrote, but declares it private.  I assume it is simply a mistake that it has been made private as there is little point in having an implemented private assignment operator (as opposed to the unimplemented ones in r5273 which do serve a useful purpose).  As the class has a public copy constructor, it should have a public copy assignment operator.  And making it private will prevent Experimental from compiling.

While I'm looking through the templates, array2d_tpl<T> and binary_heap_tpl<T> breaks as soon as T is a non-POD type, so would the two implementations of HOT_queue_tpl<T> though they're never used.  There is no documentation to say that T must be POD, far less code to prevent it from working compiling with non-POD T.  The calls to memcpy should be changed to std::copy.  More efficient would be to avoid default constructing -- i.e. use operator new instead of the new operator, and then use std::uninitialised_copy -- but that increases code  complexity quite a bit.  The minivec_tpl template has lots of places where the capacity will silently overflow resulting in buffer overruns when trying to an element to a minivec_tpl with 255 elements.  In any case, I fail to see what advantage minivec_tpl  has over vector_tpl.  It saves 4 bytes of storage per vector (yes, per vector, not per element).  In my book that doesn't justify a whole new class, especially when it's hard to get it right.  The pop function on prioqueue returns a reference to an object that has been destroyed.   I got bored looking for mistakes in the template headers at that point.

I'm sorry to have to go through the headers pointing out quite so many mistakes, but I cannot see how else to demonstrate quite how easy it is to get these sorts of templates wrong.  Writing them is hard and it is not necessary.
« Last Edit: February 11, 2012, 07:21:00 PM by ras52 »

Offline Ashley

  • Coder/Patcher
  • Devotee
  • *
  • Posts: 1288
    • entropy.me.uk
Re: [10.5, 10.6 bug] double free error
« Reply #20 on: February 12, 2012, 08:15:16 PM »
Wow, that's a lot of bugs in one place. While cross-platform compatibility is definitely an issue I think the arguments in favour of moving toward using the STL are pretty compelling in this case.


Edit: This probably also explains the random crashes I was seeing while working on the Quartz port for Mac too...
« Last Edit: February 12, 2012, 08:43:41 PM by Timothy »

Offline isidoro

  • Devotee
  • *
  • Posts: 1129
Re: [10.5, 10.6 bug] double free error
« Reply #21 on: February 12, 2012, 08:37:14 PM »
And if compatibility is essential, I would say (I don't know if it is feasible/difficult) that a conditional compile that detects STL and use its functions or the old ones if STL is absent...


Offline Ters

  • Coder/patcher
  • Devotee
  • *
  • Posts: 5555
  • Languages: EN, NO
Re: [10.5, 10.6 bug] double free error
« Reply #22 on: February 12, 2012, 09:05:36 PM »
I don't think it's easy to just switch between them.

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Devotee
  • *
  • Posts: 18753
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: [10.5, 10.6 bug] double free error
« Reply #23 on: February 12, 2012, 09:45:04 PM »
Richard,

I have had a go at implementing this - however, I still get a crash in the destructor of livery_scheme_t when closing the game (albeit a different crash now, if that's an advance...).

Offline ras52

  • *
  • Posts: 83
Re: [10.5, 10.6 bug] double free error
« Reply #24 on: February 12, 2012, 10:24:15 PM »
James -- Interesting.  It got rid of the crash for me.  What compiler are you using?  Can you try starting the game and closing it immediately without creating a new game (i.e. just leaving the demo game).   Did it crash then?

Ters -- you're right: it wouldn't be easy to support both our templates and the STL.  In fact, trying to do that would probably be worse than using either one by itself.

Isidoro -- people keep talking about compilers that don't have an STL, but so far as I'm aware every major compiler in the last 12 years or so has had one.  If there is a compiler that we care about that doesn't have a working STL, please can someone tell me what it is.  So far every compiler that's been mentioned in this context has been obsolete for the best part of a decade and the code doesn't compile on it anyway.

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Devotee
  • *
  • Posts: 18753
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: [10.5, 10.6 bug] double free error
« Reply #25 on: February 12, 2012, 10:29:50 PM »
Richard,

I am using MSVC++. I manually applied the patch to my 10.x branch (and have pushed it, so you can see if I made any errors); if you were using -devel, that might conceivably have made the difference, although I do not immediately see how.

Offline ras52

  • *
  • Posts: 83
Re: [10.5, 10.6 bug] double free error
« Reply #26 on: February 12, 2012, 11:11:17 PM »
OK, I can duplicate this on 10.x (though not on the devel branch, oddly).  Will try to investigate tomorrow when I'm a bit more awake.

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Devotee
  • *
  • Posts: 18753
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: [10.5, 10.6 bug] double free error
« Reply #27 on: February 12, 2012, 11:20:27 PM »
Thank you - most kind!

Offline isidoro

  • Devotee
  • *
  • Posts: 1129
Re: [10.5, 10.6 bug] double free error
« Reply #28 on: February 12, 2012, 11:57:32 PM »
Isidoro -- people keep talking about compilers that don't have an STL, but so far as I'm aware every major compiler in the last 12 years or so has had one.
[...]

I know that the dev team is proud that ST compiles and can be run in a great variety of platforms/OSes.  The problem may be that those new compilers may not be available for all of them: Amiga, Haiku, ...

And about keeping both, I see no problem while the interface is clear.  I recall that even some 8086 assembler was in the code when I visited it, which was, of course, to be turned off for other platforms...

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Devotee
  • *
  • Posts: 18753
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: [10.5, 10.6 bug] double free error
« Reply #29 on: February 13, 2012, 12:11:25 AM »
Hmm, but isn't std::string used in some places now?  If the reason to use non-standard templates was the absence of the STL, that reason no longer holds, as Simutrans won't compile without the STL anyway.

Offline ras52

  • *
  • Posts: 83
Re: [10.5, 10.6 bug] double free error
« Reply #30 on: February 13, 2012, 01:00:41 AM »
James-- I think I see the problem.  Whose job is it to delete the livery_scheme_t pointers held in the settings_t class?  The class gets copied, but the copy constructor of the settings_t::livery_schemes vector just does a shallow copy -- copying the the pointers rather than cloning what they point to.  So we end up with two settings_t objects, each holding the same livery_scheme_t objects.  Both instances of the settings_t class try to delete the pointers in the livery_schemes vector, but as they both point to the same objects, we get a double destruction.  As is the nature of such bugs, sometimes they manifest and sometimes they seem to work and remain hidden.  I can't see any good reason why it should work on -devel but not on 10.x, but slightly a different memory layout is probably enough to make the difference.

I haven't yet looked in detail to see whether there's genuine reason for wanting to copy the settings_t object.  If there isn't, the easiest solution may well be to make the class non-copyable by adding private copy constructors and copy assignment operators as Dwachs did to various of our templates yesterday.  If the need to make a copy is genuine, then we should implement a copy constructor and copy assignment operator that does a deep copy of the objects in the livery_schemes vector.  I'm happy to look at making a patch for that tomorrow if no-one beats me to it. 

Isisdor-- A quick Google search suggests that both Haiku and Amiga have modern versions of GCC available.  If you know otherwise, I would be interested to know more.  So far everything that anyone has said to suggest we need to support compilers without the STL has been entirely anecdotal.  Surely if there is a need to support such a compiler, someone can give a concrete example?  I have to say, though, I'm sceptical that there will be such an example.  Maybe there was once, but not now.

You say that you see no problem in keeping both implementations (our current templates and the STL ones) and having them interchangeable, but I don't think that is feasible.  There are currently some differences in the interface between, say, our vector_tpl and std::vector.  For example, what we call get_size() is called capacity() on the STL vector.  Sure we can update the interface to ours to match that of the STL (and if we don't propose to get rid of our templates, then I propose we do that).  But what of the many functions on the STL containers that do not exist on ours?  Are we to forbid their use?  We might try to, but I can't see it working for exactly the same reason that we might like to think that Simutrans currently compiles with gcc 2.95, but presumably because no-one has tried it for ages, it no longer does. 

Offline isidoro

  • Devotee
  • *
  • Posts: 1129
Re: [10.5, 10.6 bug] double free error
« Reply #31 on: February 13, 2012, 03:01:28 PM »
There is no technical problem in keeping both implementations, as long as the interface is kept.  And maybe that's the way to go if you don't want to change every single invocation to the present template classes in all the code.  Some of the consequences of that would be breaking patches, branches, forks, etc.

To put it in another words: you can keep the vector_tpl interface, compel to use it, and implement it with STL or the present implementation depending on the system.  There is no big room for a vector interface to be very different from others, isn't it?

And about the topic of getting rid of templates at all, I am speaking based on what I've seen in the forum in the past.  Prissi advocated against changing to STL some time ago, if I recall correctly.  Please wait till he comes back to hear about his reasons.  He is now skiing...



Offline ras52

  • *
  • Posts: 83
Re: [10.5, 10.6 bug] double free error
« Reply #32 on: February 14, 2012, 03:35:10 PM »
I think I've sorted out the problems with memory management and the  vector of livery schemes.  It seems that the settings_t class is being copied intentionally, e.g. in the enlarge_map_frame_t constructor.  It's possible we could work around the need for a copy constructor, but it seemed easier to just make the settings_t class copyable.  Implementing a copy-constructor directly would be a maintenance nightmare, as it would need a list of members which would need keeping up to date.  The normal modern way of handling this problem is to use a vector of some sort of smart pointer class, e.g. std::vector<std::shared_ptr<livery_scheme_t>>, but if we're not willing to use std::vector which has no portability issues, the chances of us being willing to use std::shared_ptr which is a new addition seem tiny. 

So instead I've implemented wrapper around the vector_tpl that deals with memory management of a vector of pointers.  The patch for the 10.x branch of Experimental is here.  Standard has no need for such a patch as it has no livery schemes.

... And I'm now seeing an intermittent segfault coming out of grund_t::neuen_weg_bauen.  (I've checked that I don't have any of my patches for diagonal maintenance applied, so that cannot be the problem.)  Will keep looking.

Offline Spike

  • *
  • Posts: 1361
  • First Simutrans Developer and Graphics Artist
Re: [10.5, 10.6 bug] double free error
« Reply #33 on: February 14, 2012, 03:42:46 PM »
We might try to, but I can't see it working for exactly the same reason that we might like to think that Simutrans currently compiles with gcc 2.95, but presumably because no-one has tried it for ages, it no longer does. 

It doesn't. I've tried when I came back to development, because 2.92.2 was the newest C/C++ compiler that I had installed. So I had to install something newer ...

Offline jamespetts gb

  • Simutrans-Extended project coordinator
  • Devotee
  • *
  • Posts: 18753
  • Cake baker
    • Bridgewater-Brunel
  • Languages: EN
Re: [10.5, 10.6 bug] double free error
« Reply #34 on: February 14, 2012, 04:22:41 PM »
Richard,

thank you very much for this.

I wonder whether it might yet be the case that Standard will be converted to STD templates (in which case Experimental, and possibly Iron Bite (is that right, HAns?), will follow suit). Prissi is on holiday this week, so there might be some delay in it being considered fully. Hans - what are your views in switching to STD templates?

One thing that I do wonder is whether it might be worth writting wrapper headers, which simply #define the names of methods in the current templates to the names of similar methods in the STD templates - or is that not possible because they take a diffrent number or type of argument? That would allow us to switch to STD templates on Experimental without necessarily there being a change on Standard (and also then use conditional compiles to switch between the old Simutrans templates and the STD types, which could be used in profiling and for debugging).