News:

Simutrans Wiki Manual
The official on-line manual for Simutrans. Read and contribute.

Command line server issues

Started by jamespetts, May 26, 2013, 10:50:44 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

jamespetts

#70
Hmm - thank you all for your input on this: it is much appreciated. It is most annoying that it is so difficult to solve. Because this occurs reproducably in Experimental and is not known to occur in Standard, and because Neroden's investigations suggest that it occurs in the city car code (which has been changed in some places from Standard), I am inclined to think that this error is likely to be Experimental specific. I am minded to re-test by temporarily disabling end of month city car related code to see whether that has any effect. It is a shame that my previous attempt to fix this based on what I hoped was an informed guess as to what the problem is did not work, but, whether it fixes this or not, I think that I found and fixed a real bug, so that is progress at least. I shall have a go at testing to-morrow when less tired (spent the day testing the comfort of various early 20th century railway carriages at the Bluebell Railway, which was most pleasant, especially since it seemed to show that the values that I had guessed for Pak128.Britain-Ex were more or less correct).

Edit 1: Some further testing has shown simultaneously puzzling and promising results. Taking the original saved game that I used in my tests, which would regularly crash at the end of the month with the command line server build, I fast forwarded it with a Windows graphical client for a few months in an unrelated test, then took that saved game, and fast forwarded it to the end of the month so that I could test for crashes at the end of that month without having to wait for the whole month as with upload-refresh.sve. Bizarrely, I can no longer reproduce this crash with the new saved game: it goes over the month boundary without difficulty, and has now done so on two separate months.

Either this crash is idiosyncratic in the extreme, or there was some sort of corruption of the original saved game file which sorted itself out somehow when run for a few months on the Windows version.

It is difficult to know at this juncture quite what to do with this information.

Edit 2: Testing the same thing now for a longer period on the current dingliste-cleanup branch - I have now run from January to July in game time, with no trouble at the month boundaries at all. Indeed, not only does it no longer crash at the beginning of a new month, it also no longer desyncs (even with two clients connected, and even after regular loading/saving as a result of my script's regular calling of nettool's force-synce command).

The inference that I draw from that is that this commit fixed it, but that the original upload-refresh.sve contained corrupted data which had to be flushed out of the dingliste by running it in a non-affected system before these problems would no longer show on an affected Linux system.

That commit fixed a problem whereby, when private car objects were deleted at the end of a month by a routine called to delete private cars if there were too many of them for the level of traffic generated, then it would delete them by a different method than the standard method of deletion (setting current_list to NULL rather than setting time_to_life [sic] to 0). The trouble was that there was more than one place in the code where the car object would check whether time_to_life was zero, and I had not added the extra check for current_list being NULL in one place. It would not have made sense to check for that there in any event, so I simply reverted to deleting by setting time_to_life as NULL. Additionally, cars which were deleted for reasons other than being excess in number at the end of the month were not removed from current_list, with the result that these cars would be deleted again at the end of the month when their numbers were deemed excessive - quite possibly causing the double free errors that Nathaneal detected with the very helpful test involving disabling freelist. Indeed, it was these tests (diagnosis of double free connected to the city car code) that put me onto this bug in the first place.

This testing having been done, the next question is: should the next (and hopefully final, subject perhaps to considerations relating to the revenue cleanup and testing of that) release candidate be based on the dingliste-cleanup branch or not? My fix applies both to the branch affected by dingliste-cleanup and that not. I have so far only tested with the dingliste-cleanup branch: although I suspect that the results would be the same with 112.x-private-car-merge, that has not been tested. Is it worth doing that test, or is it safe enough to use dingliste-cleanup? Indeed, is there any advantage in doing so?

Edit 3: I wonder whether this has all taught us that either dingliste or freelist should try to detect double free errors, if possible? The trouble with using freelist is that things like Valgrind and Dr. Memory don't spot when double free (and possibly other) errors are occurring with them, so memory management errors can be hard to find. I don't know how either work well enough, however, to be able to know whether doing this is possible (1) at all; or (2) without overly compromising performance. (If it is possible but only at the expense of performance, it might be worth putting it behind conditional compilation code for debugging purposes).

Edit 4
: It got to nearly the end of August, when I manually stopped it, without crashing or desyncing at all.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

neroden

 
QuoteIs it worth doing that test, or is it safe enough to use dingliste-cleanup?
The current version of dinglist-cleanup should be safe enough.  It's basically just a cleanup of the code so that all of the functions run through the cases in the same manner (previously, each function in dingliste_t ran through the cases in a different manner, making it hard to tell whether they were all doing the right thing). 

QuoteIndeed, is there any advantage in doing so?

It should also be slightly faster.
(1) the old version had some very poorly thought-out logic related to "set_capacity"; that might have been optimized out, but it might not have been.
(2) the new version has slightly more carefully thought-out code for expanding the list and for recovering wasted memory, based on the patterns of usage which happen in-game.

----

While you're at it, the current version of mail-revenue is actually recognizing mail revenue and I think there are no bugs in actual revenue...

So you might want to merge it.  There are still several lurking bugs related to the display of items in the goods list, however, which I'd like a chance to fix first.

QuoteEdit 3: I wonder whether this has all taught us that either dingliste or freelist should try to detect double free errors, if possible? The trouble with using freelist is that things like Valgrind and Dr. Memory don't spot when double free (and possibly other) errors are occurring with them, so memory management errors can be hard to find. I don't know how either work well enough, however, to be able to know whether doing this is possible (1) at all;
Well, yeah.  Write freelist_t to check, on "putback", whether it's putting back something which is actually free already.
Quote(2) without overly compromising performance. (If it is possible but only at the expense of performance, it might be worth putting it behind conditional compilation code for debugging purposes).

I'm not sure how much it will compromise performance.  Simutrans actually doesn't free its objects very often, particularly the small objects.  The performance hit would probably mostly be at the end of the month.  Or, I could be wrong, because slist_tpl uses the freelist, and it may be freeing frequently as slist_tpl items go out of scope.

jamespetts

Thank you for that indication - that is helpful. I shall await your mail revenue fixes with interest, in that case.

As to the freelist changes, should discussion about that perhaps go into the Standard development boards?
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

prissi

freelist makes loading 300% faster! So this is really needed (and obviously some of the speed is because of not checking for the right size and some stuff.) But as mentioned earlier, you could set flags to debug freelsite even with valgrind. You could even add another 4 byte left of the pointer to include information about freeed, or size and generate errors. Very trivial to do this, an only a memory penalty.

And again: since you distribute experimetnal as release builts without a lot of other safegouards (asserts, range checks) whose impact is reather small compared to freelsit, I think it would make most sence to keep freelist.

jamespetts

I don't think that we were proposing to get rid of freelist entirely, but to consider adding checks.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

prissi

It will be about 10-15% increase in memory, which might be acceptable.

jamespetts

Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Dwachs

Quote from: neroden on June 22, 2013, 10:10:49 PM
This does not seem to happen, at least not for fussgaenger_t (which is the first one I checked, out of several).Which does not get called.
On deletion of *what*?  Sure, on deletion of the grund_t object, but that's not happening.
You might want to check the destructor ding_t::~ding_t(), which DOES the cleanup. And this IS called as soon as any such ding object is deleted.

I get the impression that you seem to assume that any code not written by yourself is a-priori full of bugs.
Parsley, sage, rosemary, and maggikraut.

Dwachs

Quote from: neroden on June 22, 2013, 07:39:50 PM
(3) The imagereader.cc crash on line 147 is more understandable.  The command line server relies on a really awful hack when loading the image data, to avoid actually loading the image data, and I think the hack isn't fully implemented.  This is a bug present in standard.  It would be highly desirable to implement this more cleanly (in standard), but it will take quite a lot of work.
Thanks for reporting, seems that for old pak files the reader tries to read (and even write if unlucky) out of bounds...
Parsley, sage, rosemary, and maggikraut.

neroden

Quote from: Dwachs on June 27, 2013, 08:04:12 AM
You might want to check the destructor ding_t::~ding_t(), which DOES the cleanup. And this IS called as soon as any such ding object is deleted.

I get the impression that you seem to assume that any code not written by yourself is a-priori full of bugs.
This is because it's usually true :-) 

(Except when I'm working with people who have *proven* their competence such as the GCC developers.  Or when the code is *self-documenting* or *properly commented*.  This is not to say that my code is free of bugs, obviously.)

Attempting to do this cleanup in the destructor creates race conditions.  I guess as long as simutrans is single-threaded, that doesn't matter, but.... it can also require the *entire map* to be searched.  Yeech.