News:

Simutrans Wiki Manual
The official on-line manual for Simutrans. Read and contribute.

Stack overflow in cbuffer_t

Started by jamespetts, January 14, 2013, 10:18:39 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

jamespetts

Hmm - looking to try to fix an issue in Experimental has sent me delving into GUI/string code that is unchanged from Standard, and I should be grateful for any indications on what is supposed to happen so that I can try to fix the problem.

What appears to occur in the crash reported in the thread linked above is that the station in question, "Nutish railport", has a very large number of passengers waiting there. A single packet of passengers has a "menge" of 68,173. In freight_list_sorter_t::add_ware_heading, there is a line,


buf.printf(" %u", sum);


When that line is called on the very large ware packet, a stack overflow is triggered in output.c (Simutrans specific code is left at void cbuffer_t::printf, where the line:


int    const count = my_vsnprintf(buf + size, n, fmt, ap );


is hit, calling


return _vsprintf_p(buf, n, fmt, ap);


triggering what I assume to be non-Simutrans specific code _vsprintf_p).

Given the limited range of parameters passed to _vsprintf_p and the fact that, being a non-Simutrans method, it will not be accessing any Simutrans heap variables, the issue can probably be narrowed down to an excessive value of n, but I shall give all the other values for completeness:

buf: 0x286a0040 (which, according to MSVC++, points to a value of 0)
n: 4194304 (which is the exact size of 22 bits)
fmt: a pointer to %u
ap: a pointer to "M  "

May I ask - what is the significance of "n", which seems to relate to the capacity of the cubffer_t object, and how is it set? Am I right in inferring that the greatness of this value is likely to be the cause of the stack overflow, or might it be something in that first "buf" pointer?

Any tips on debugging this would be much appreciated.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

isidoro

buffer seems to be the place where the output of the function is written.  The function can't know how big is that chunk of memory.  That's the purpose of n: a safeguard.

If the value of n is bigger than the actual size of the memory chunk allocated for buffer and the output of the function is very long, there can be memory corruption.

The size of the memory allocated for buffer has to be traced back to where it was allocated.

jamespetts

Hmm. Do you think that this sort of corruption is causing a stack overflow - would that not be more likely to cause an access violation, with a stack overflow being caused by excessive size?

In any event, how might one determine the actual memory size of buf?
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Ters

Stack overflow is caused by too deep a call stack (usually runaway recursion), too many local variables, or a combination of both, not by writing out of bounds. Unless someone/something is using a different definition somewhere.

isidoro

QuoteI've seen things you people wouldn't believe. Attack ships on fire off the shoulder of Orion. I watched C-beams glitter in the dark near the Tannhauser gate.

I've seen things you people wouldn't believe when memory is corrupted...  Even one of those buffers can be allocated in the stack and the content of the latter be overwritten...

@James:  I think there are programs (esp. for linux) that check for bad allocation/deallocation... Electric fence is one that comes to my mind now.

jamespetts

#5
Hmm - this is getting odder. I have found a new variant of this crash, occuring at this line:


freight_list_sorter_t::sort_freight(warray, buf, (freight_list_sorter_t::sort_mode_t)sortierung, NULL, "waiting", welt);


This is called from void halt_info_t::zeichnen(koord pos, koord gr), which in turn is called from void display_win(int win) in simwin.cc, which in turn is called from void win_display_flush(double konto) also in simwin.cc, which is called by void intr_refresh_display(bool dirty) in simintr.cc, which in turn is called by void karte_t::sync_step(long delta_t, bool sync, bool display ). There is clearly no recursion of any sort here. The only thing that could possibly cause this stack overflow crash is the fact that warray had a size of 82362 and a count of 42944.

The difficulty that I am having in fixing this crash is that it is occurring in Standard code (no doubt indirectly caused by Experimental changes causing less merging of packets because origins are stored), with which I am less familiar. Any assistance in finding a way around this would be much appreciated.

Edit: I can reproduce this with very similar, but not identical, values for size and count with the latest release candidate.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Ters

If warray is allocated on the stack, it might be the culprit. I could perhaps have gone with allocating a 4096 byte array on the stack, put close to 100000 seems a bit excessive. Windows has a 1MB stack (can be configured in exe), so it should be fine. Maybe other OSes are more conservative about default stack size.

Sort functions are also possibly recursive in some way.

prissi

Does freight list sorter calls std::qsort at some point? This function will at least allocate 1.5 times the same amount (possibly on the stack) for the actual sorting.

Also ..._printf... libary functions may use a stack internally. It may even be their local stack, not the program stack (although of this I am not sure).

You (or MSVC) might also accidentally put some meaningless value (like 40000) for the stack size into the project file. Maybe increase the stack size to 4 MB. I had to do this for very early GCC compiles, especially with SDL builds.

jamespetts

#8
Thank you very much both of you for the suggestions. How does one go about altering the stack size, may I ask?

Edit: I discovered how to increase the stack size, but this did not assist: the stack overflow still occurred in the same conditions.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Ters

Just increasing the stack size to avoid an overflow sounds a bit like voodoo debugging. Without knowing the reason why so much stack is needed, it might just reduce the likelihood of the bug appearing, or cause the program to hang for longer times before being killed by the OS.

jamespetts

Hmm - is the problem not related to the fact that in the call:


freight_list_sorter_t::sort_freight(warray, buf, (freight_list_sorter_t::sort_mode_t)sortierung, NULL, "waiting", welt);


"warray" is passed by value, not by reference/pointer? "warray" is a vector, and if the size of the vector is very large (as it is in the case that causes the problem), might the copying necessary to copy such a large array not cause a stack overflow?

Might I ask - is there a reason that this is passed by copying not by pointer here, or could I change it to pass it by pointer relatively easily?
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Ters

warray is passed by reference. I also see now that it is a vector, so its contents is not on the stack.

However, sort_freight() tries to allocate an array on the stack of equals size to warray, containing ware_t objects (at least in standard). That's probably where it fails. But before I offer a local solution: how often is this method called? Sorting almost 100000 items is a bit expensive, and will have an impact if this is done many times per frame.

jamespetts

Thank you for that helpful insight. I have just checked frequency of calling the line


freight_list_sorter_t::sort_freight(warray, buf, (freight_list_sorter_t::sort_mode_t)sortierung, NULL, "waiting", welt);


: it is called a number of times whenever:

(1) a stop information dialogue is opened;
(2) the sort method in the stop information dialogue is changed; or
(3) the number of passengers/goods waiting at the stop is changed.

The number of times that it is called seems to vary considerably from 1 to about 12 on a small map (the Pak128.Britain demo.sve). This is a GUI exclusive function, and is only called when the stop dialogue is open, and only for that particular stop. However, it does seem to be called a number of times over whenever the waiting goods/passengers list needs updating, which can be fairly frequent for a large stop. This does not seem to be a highly performance critical method, but it cannot be highly intensive computationally, I think.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

TurfIt

sort_freight is only called when necessary - controlled by resort_freight_info flag.

I'd start by getting rid of the ALLOCA for wlist... it's the only 'big' thing I see Simutrans is putting on the stack. Beyond that, you're at the mercy of your compiler/libraries implentations of std::sort and _vsprintf_p and their stack usage as mentioned by prissi.

jamespetts

Thank you for the suggestion. I am trying to work out how to make this work. A vector would seem to be the obvious solution, but I am struggling (largely thorough inexperience with std::sort) to make this line compile with a vector (or alter it so that it does compile/work):

std::sort(wlist, wlist + pos, compare_ware);
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

TurfIt

simplest to just replace the alloca with a malloc/free

jamespetts

#16
Ahh, I see. I have done this:


ware_t* wlist = (ware_t*) malloc(warray->get_count() * sizeof(ware_t));


and at the end of the method:


free(wlist);


Does this seem sane?

Edit: Testing this, heap allocation here seems to cause enormous slowdown: in debug mode, the (large) game becomes unresponsive and pausing it almost invariably shows that it is in the process of sorting freight.

In principle, this could be threaded, could it not...?
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Ters

Heap allocation is relatively slow, hence the use of ALLOCA instead of malloc. Is the slowdown noticable with smaller inputs? Otherwise, it might be slow because it actually does the big sort now, rather than crash.

Quote from: jamespetts on January 16, 2013, 08:45:14 PM
In principle, this could be threaded, could it not...?

In principle, but depending on the way the surrounding code works, it might not work without major work. (Or it would work, if you spawned a thread, waited for it's completion, and then continued. But that would be even slower.)

TurfIt

I can't see heap allocation being the slowdown cause. Even if called every frame, there's just not much work being done. Perhaps sorting is the culprit - just how big is this list? If the mere 42944 count from above, then that's really not much either. Need profiling results to diagnose any slow issues...
How often is sort_freight being called? perhaps Experimental has a bug that's not setting the resort flag properly.
Are you sure sort_freight is the issue? as opposed to then trying to display the sorted list afterwards? The whole get_freight_info dumping the entire formatted list into a huge text buffer and then displaying only part of it with a scrolly is not exactly optimal. Wasn't experimental already having a corrupted halt_info display from overflowing the scrolly?

jamespetts

I have now tested this with an optimised release build - performance is still a little slow, but just about acceptable, I think - certainly better than crashing, in any event. I have tried to get the best of both worlds in the following way:


ware_t* wlist;
const int warray_size = warray->get_count() * sizeof(ware_t);
const int max_stack_size = 838860;
if(warray_size >= max_stack_size)
{
// Too large for the stack - use the heap (much slower)
wlist = (ware_t*) malloc(warray_size);
}
else
{
// Old method - use the stack, but will cause stack overflows if
// warray_size is too large
wlist = (ware_t*) alloca(warray_size);
}


And not forgetting:


if(warray_size >= max_stack_size)
{
free(wlist);
}


at the end of the method. This seems to work.

Thank you everyone for all your help with this - much appreciated!
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

prissi

malloc should not have a big impact, as this is called many times during a frame; for isntance when new cars or pedestrians are generated. I would get rid of the alloc and just purely use malloc. If you remember the size, you could just do the malloc when needed oánd otherwise work with previously allocated array.

Dwachs

It would have been a nice gesture, if the fix applied to experimental would be made available for standard, too.
Parsley, sage, rosemary, and maggikraut.

jamespetts

Quote from: Dwachs on January 18, 2013, 04:18:56 PM
It would have been a nice gesture, if the fix applied to experimental would be made available for standard, too.

I think that all that one has to do is copy and paste the bits of code that I put in the message above. I am not set up with the Standard codebase at present or with any SVN/patch software, so it would take more work for me to create a .patch file than for a Standard developer to copy and paste the code in the message above into the relevant part of Standard.

Edit: Incidentally, you might also want to think about the fix for the loading/saving of games with large numbers of ware packets at a station by replacing the signed 16 bit integers with unsigned 16 bit integers for storing the count of the "warray" object. Standard is very unlikely to generate as many ware packets as Experimental because Standard does not take account of origins when merging, but it never hurts to be safe.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Ters

I've attached two simple patches. One is thread safe, but calls malloc and free repeatedly, while the other uses static pointers in order to try and reuse the same memory from call to call, but loses thread safety by doing so. I've briefly tested both, and the former seems to cause a slighly faster rise in memory usage than the latter. I don't know if it somehow leaks, if it's because memory becomes more fragmented or if it's just a coincidence. (Though Simutrans is supposed to be predictable random, isn't it?)

A possible improvement for the latter patch is allocating slightly more than needed, in order to decrease the likelihood of the array being too small next time (and the times after that).

BTW, I can't build revision 6289 and later. That commit look a bit related to this.

prissi

Even though threadsafe is not really required, I still prefer this. The routine is called seldom enough.

Thank you very much.

About predictable random: Yes if you load a networkworkgame the random counter will be the same. Otherwise it will be initilized with the time, to avoid such reuccurences upon reloading. Also frame rate and hence times will be slightly different in a reloaded game.

Ters

The thread safe version could also use a vector to make sure that the memory is freed when exiting the scope through any means.