News:

Congratulations!
 You've won the News Item Lottery! Your prize? Reading this news item! :)

r7638 crash when loading game

Started by captain crunch, November 03, 2015, 12:21:50 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

captain crunch

This saved game crashes in r 7638.
gdb backtrace:

[New Thread 0x7fffec8dd700 (LWP 9626)]
FATAL ERROR: sim_new_handler() - OUT OF MEMORY
Aborting program execution ...

For help with this error or to file a bug report please see the Simutrans forum at
http://forum.simutrans.com

Program received signal SIGABRT, Aborted.
0x00007ffff6710107 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  0x00007ffff6710107 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007ffff67114e8 in __GI_abort () at abort.c:89
#2  0x0000000000714c61 in log_t::fatal (this=0xb16c20, who=0x77091e "sim_new_handler()",
    format=0x770910 "OUT OF MEMORY") at utils/log.cc:346
#3  0x00000000006b71b7 in sim_new_handler () at simmain.cc:365
#4  0x00007ffff6ffa2fc in operator new(unsigned long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00007ffff6ffa399 in operator new[](unsigned long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00000000006758fe in vector_tpl<stadt_t::factory_entry_t>::resize (this=0x1cd0f78,
    new_size=4278976512) at boden/wege/../../dataobj/../tpl/vector_tpl.h:51
#7  0x00000000006692d5 in stadt_t::factory_set_t::rdwr (this=0x1cd0f78, file=0x7fffffffc020)
    at simcity.cc:918
#8  0x000000000066adbd in stadt_t::rdwr (this=0x1cd04b0, file=0x7fffffffc020) at simcity.cc:1286
#9  0x000000000066a203 in stadt_t::stadt_t (this=0x1cd04b0, file=0x7fffffffc020) at simcity.cc:1115
#10 0x0000000000706ecc in karte_t::load (this=0x121d5b0, file=0x7fffffffc020) at simworld.cc:5682
#11 0x000000000070584b in karte_t::load (this=0x121d5b0, filename=0x196db08 "save/bib.sve")
    at simworld.cc:5358
#12 0x00000000006b9918 in simu_main (argc=5, argv=0x7fffffffe1d8) at simmain.cc:1174
#13 0x00000000006c9eca in sysmain (argc=5, argv=0x7fffffffe1d8) at simsys.cc:804
#14 0x0000000000734cb2 in main (argc=5, argv=0x7fffffffe1d8) at simsys_s.cc:714


--- Edit: split from the other thread as this is an independent bug report / Dwachs

Dwachs

Is this savegame from auto-save at exit of the program?
Parsley, sage, rosemary, and maggikraut.

Ters

Quote from: Dwachs on November 03, 2015, 01:15:31 PM
Is this savegame from auto-save at exit of the program?

In the auto-save called bib?

captain crunch

Quote from: Dwachs on November 03, 2015, 01:15:31 PM
Is this savegame from auto-save at exit of the program?
No, it is an ordinary saved game.

DrSuperGood

We need more detail such as the pakset (and version) the save is from, the pakset you are using now, the version last able to run the save and the version you are using now. Also providing the save is a good idea as it can let us run it in debug builds to more accurately diagnose the cause.

I recall a crash happening if a pakset modified the inputs to a factory and am unsure if that was fixed and usually resulted in null pointer dereference.

Since it is throwing out of memory it could be the result of a corrupted array allocation (interpreting a mal-aligned element as list size causing a nonsense list size allocation). If the map is very big and complex it could also be the result of 32bit application limitations since the newest release does raise the size of some elements.

Ters

Quote from: DrSuperGood on November 03, 2015, 09:21:20 PM
Since it is throwing out of memory it could be the result of a corrupted array allocation (interpreting a mal-aligned element as list size causing a nonsense list size allocation). If the map is very big and complex it could also be the result of 32bit application limitations since the newest release does raise the size of some elements.

It is clearly a 64-bit build of Simutrans (which isn't as stable as a 32-bit build), so it must be quite some big allocations going on. The save game reader (or writer) must indeed have been derailed somehow.

prissi

As it is crashing during the reading of cities, this is almost certainly a game saved either via autosave or from a server using the nightly. Please load that game with the same nightly and save it normally. Then it will load in any other stable.

captain crunch

pak64 120.0.1 r1494
game created in simutrans 120.1 r7582
last saved in 120.1 r7635
compiler is g++ 4.9.2 on x86_64 GNU/Linux

I checked out older revision 7582 but loading the game is what is not working.
This is the last good version of the saved game I could find. Sadly it is already from three weeks ago.

Ters

Quote from: captain crunch on November 03, 2015, 10:39:22 PM
I checked out older revision 7582 but loading the game is what is not working.

It was revision 7635 that you were supposed to check out. (Loading a game by an older version than what you saved it in is generally not supported.) However, that was under the assumption that it was an autosave or server save. You've said that it was not the former.

Quote from: captain crunch on November 03, 2015, 10:39:22 PM
This is the last good version of the saved game I could find.

We're not interested in the one that works. We're interested in the one that doesn't.

captain crunch

Went back to r7635, loading the game results in this:

#0  0x00000000007149b2 in log_t::fatal (this=0xb16c20,
    who=0x77069e "sim_new_handler()", format=0x770690 "OUT OF MEMORY")
    at utils/log.cc:341
#1  0x00000000006b6f2f in sim_new_handler () at simmain.cc:365
#2  0x00007ffff6ffa2fc in operator new(unsigned long) ()
   from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00007ffff6ffa399 in operator new[](unsigned long) ()
   from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x0000000000675676 in vector_tpl<stadt_t::factory_entry_t>::resize (
    this=0x1cd1188, new_size=4278976512)
    at boden/wege/../../dataobj/../tpl/vector_tpl.h:51
#5  0x0000000000669289 in stadt_t::factory_set_t::rdwr (this=0x1cd1188,
    file=0x7fffffffc000) at simcity.cc:918
#6  0x000000000066ad71 in stadt_t::rdwr (this=0x1cd06c0, file=0x7fffffffc000)
    at simcity.cc:1286
#7  0x000000000066a1b7 in stadt_t::stadt_t (this=0x1cd06c0,
    file=0x7fffffffc000) at simcity.cc:1115
#8  0x0000000000706c44 in karte_t::load (this=0x121d7d0, file=0x7fffffffc000)
    at simworld.cc:5682
#9  0x00000000007055c3 in karte_t::load (this=0x121d7d0,
    filename=0x196dd18 "save/bib.sve") at simworld.cc:5358
#10 0x00000000006b9690 in simu_main (argc=7, argv=0x7fffffffe1b8)
    at simmain.cc:1174
#11 0x00000000006c9c42 in sysmain (argc=7, argv=0x7fffffffe1b8)
    at simsys.cc:802
#12 0x0000000000734a2a in main (argc=7, argv=0x7fffffffe1b8) at simsys_s.cc:714

DrSuperGood

Quote0x0000000000675676 in vector_tpl<stadt_t::factory_entry_t>::resize (
    this=0x1cd1188, new_size=4278976512)
It is trying to resize the vector to 4,278,976,512 (0xFF0C0000) elements which at ~20 bytes per element requires ~80 GBi of memory.


void stadt_t::factory_set_t::rdwr(loadsave_t *file)
{
if(  file->get_version()>=110005  ) {
uint32 entry_count = entries.get_count();
file->rdwr_long(entry_count);
if(  file->is_loading()  ) {
entries.resize( entry_count );
factory_entry_t entry;
for(  uint32 e=0;  e<entry_count;  ++e  ) {
entry.rdwr( file );
total_demand += entry.demand;
total_remaining += entry.remaining;
entries.append( entry );
}
}
else {
for(  uint32 e=0;  e<entry_count;  ++e  ) {
entries[e].rdwr( file );
}
}
file->rdwr_long( total_generated );
}
}

As can be seen from the code, the vector is being resized from a value read in from the file. The value, 0xFF0C0000, is rather suspicious and reeks of data misalignment.

As others have mentioned, we need the save which crashes so we can recreate this error in our own tests. This would potentially allow for more detail as to where the data misalignment is occurring.

captain crunch

You can find the link to the saved game that crashes upon loading in the first message of this thread. :-)

Edit:
Looks like it got lost. Uploading again.

Dwachs

Your save game bib-last-good has 10 cities in it, the broken savegame claims to have 16 cities. Did you create new cities?

The savegame seems to be broken: 10 cities can be read successfully, the 11th city contains already bogus data (empty name, strange positions). This seems to be completely unrelated to the other (now fixed) bugs.
Parsley, sage, rosemary, and maggikraut.

Ters

I find the way the number of cities is handled somewhat odd. Simutrans doesn't save the actual number of cities, but the number of cities in the settings. Can't the user change this setting to something different from the actual number of cities in the game?

captain crunch

Quote from: Dwachs on November 04, 2015, 08:14:40 PM
Your save game bib-last-good has 10 cities in it, the broken savegame claims to have 16 cities. Did you create new cities?

No, didn't.

prissi

Then probably a bit flipped on your harddisk, breaking that game. Happens rarely, but happens. Or it flipped in memory and hence when saving it got wrong data.

Ters

Quote from: prissi on November 04, 2015, 11:13:28 PM
Then probably a bit flipped on your harddisk, breaking that game. Happens rarely, but happens. Or it flipped in memory and hence when saving it got wrong data.

The difference between 10 and 16 is not one bit, but three. And if it flips a bit on disk, a single flipped bit should cause corruption to other things than just the number of cities due to the compression. Dwachs was able to read everything right through city 10, which I don't know if is possible if the compressed data was corrupted.

Dwachs

Reading up to city 10 is not much, only settings are loaded before the cities will load. The map and everything else will be loaded later.

In the savegame, not only the number of cities is wrong, but also the data after city 10 is corrupted (ie setting the city count to the right one leads to another segmentation fault). Sadly, the savegame is not repairable.
Parsley, sage, rosemary, and maggikraut.

Ters

Quote from: Dwachs on November 05, 2015, 06:43:27 AM
Reading up to city 10 is not much, only settings are loaded before the cities will load. The map and everything else will be loaded later.

If the compressed data was corrupted already at the number of cities, you probably wouldn't be able to read anything but garbage from even the first city.

Quote from: Dwachs on November 05, 2015, 06:43:27 AM
In the savegame, not only the number of cities is wrong, but also the data after city 10 is corrupted (ie setting the city count to the right one leads to another segmentation fault). Sadly, the savegame is not repairable.

Sound more like the number of cities is correct, but that the corruption starts after city 10.

Dwachs

I will move to 'solved bug reports', as there is no - and most likely never will be -  a solution.
Parsley, sage, rosemary, and maggikraut.

DrSuperGood

I think a more clear conclusion is that the cause of the problem is not reproducible. The save was corrupted but what caused the corruption is unknown.

whoami

@captain crunch: I strongly recommend to check your RAM for malfunctions, because those, if applicable, may cause program malfunctions, data corruption and slow degradation of your whole system. Even if the savegame has been damaged by something else, checking the RAM is useful, because the silly manufacturers (especially Intel) do not provide systems with ECC capability, even with the cheap RAM of today.
One program for this task is Memtest86+ (needs to be run without OS), see http://www.memtest.org/

Ters

Unfortunately, memtest takes forever. And even when I had trouble with some RAM, it never found anything. Maybe because I didn't give it forever, but most likely because it was due to overheating (poor ventilation for the rearmost slots), and memtest only stressed the RAM, not the entire system.

whoami

Hmm, so do you have a favourite testing program? Some Linux distributions (see boot menu) and even Windows (mdsched) deliver a testing program with them.

Vladki

Memtest takes forever? Oh, you must have tons of ram and ancient cpu then. One or two hours should be enough on most computers. Just run it while in work/school/pub...

whoami

Memtest takes some time where you cannot use the PC, that's right. Also: I got no error from Memtest (in several runs) on one PC where the BIOS occasionally finds a module to be bad. I still have a PC (workstation) with ECC, but it's too old for a current Windows version. With it, I found out that defective memory in the graphics card can still crash your computer, even if everything else has ECC.

Ters

Quote from: Vladki on November 09, 2015, 07:45:27 AM
Memtest takes forever? Oh, you must have tons of ram and ancient cpu then. One or two hours should be enough on most computers. Just run it while in work/school/pub...

I've been told it should at least run over night. It needs several passes to trigger some errors. (Only if you constantly have crashes can one be enough, but that doesn't appear to be the case here.) Last I tried memtest, it was on 32 GB RAM and a 2 GHz i7 CPU. It wasn't fast.

Quote from: whoami on November 09, 2015, 07:18:25 AM
Hmm, so do you have a favourite testing program? Some Linux distributions (see boot menu) and even Windows (mdsched) deliver a testing program with them.

In my cases, the computer would blue screen from time to time (case 1) or the computer would completely lock up for 32 seconds when visiting certain web pages (case 2). In the first case, I unplugged half the RAM, then used the computer normally for some days. Then I took out the other half and put the first one in, and used that for a few days. (In this case, neither half caused problems. Only when all slots were filled did it crash. memtest found nothing.) In the other case, with a different computer, I just replaced all the RAM with the left-over RAM from case 1. (Made no difference. Eventually the web page that most often caused the problem changed design, so the problem was less bothersome. The I upgraded to Windows 10, and it disappeared completely.)