News:

Use the "Forum Search"
It may help you to find anything in the forum ;).

Newly compiled way-improvements irregular freezes

Started by Junna, August 28, 2014, 04:09:22 AM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Junna

The game appears to freeze irregularly. However, it is very difficult to track this down because it is not reproducible with debug build, though both single-threaded large-address aware and x64 release builds result in it.

http://www.mediafire.com/download/dq8hkmqrpp29h8e/test5.sve

Save game. Empty map. Should, if it is reproducible for you, crash in January or February through a no-message freeze.

jamespetts

Does this occur when not doing anything, or does it require interaction to make it occur?
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Junna

When not doing anything. Should just occur inexplicably./ When fast-forwarding. Haven't tried without.

jamespetts

Has this  only recently started occurring (since some of Philip's changes, for example)?
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Philip

Quote from: Junna on August 28, 2014, 04:09:22 AM
The game appears to freeze irregularly. However, it is very difficult to track this down because it is not reproducible with debug build, though both single-threaded large-address aware and x64 release builds result in it.

http://www.mediafire.com/download/dq8hkmqrpp29h8e/test5.sve

Save game. Empty map. Should, if it is reproducible for you, crash in January or February through a no-message freeze.

That is odd. I'm investigating. I don't know much about Windows debugging, I'm afraid, but I think it should be possible to do this: https://developer.mozilla.org/en-US/docs/How_to_get_a_stacktrace_with_WinDbg and get a stack trace with WinDbg even for a non-debug build (run the simutrans executable from WinDbg's Debug>Run menu entry, then choose Debug>Break when it hangs—the syntax appears to be even more arcane even than gdb's, so it might be best to leave WinDbg open at that point, but I think the command to enter would be |* ~* kp.

I'm not sure this is true for the Windows build, but the Linux build uses different settings files for debug builds and non-debug builds; that might potentially be the reason you cannot reproduce this in a non-debug build. Can you try backing up your settings files and deleting them (settings-experimental.xml, settings-experimental-debug.xml) to see whether you can still reproduce the freeze like that?

Quote from: jamespetts on August 28, 2014, 09:29:11 AM
Has this  only recently started occurring (since some of Philip's changes, for example)?

I'm assuming for now that it is indeed due to my changes. Sorry about that. I think there's only one place that might potentially cause an infinite loop, but I haven't been able to trigger it. Still, you might want to try the attached patch if you have the time. Thanks for reporting this!

Junna

Quote from: Philip on August 28, 2014, 01:37:54 PM
I'm not sure this is true for the Windows build, but the Linux build uses different settings files for debug builds and non-debug builds; that might potentially be the reason you cannot reproduce this in a non-debug build. Can you try backing up your settings files and deleting them (settings-experimental.xml, settings-experimental-debug.xml) to see whether you can still reproduce the freeze like that?

The Windows does too; and testing, I deleted the settings files. It still freezes in the same manner.

I don't know how to use .diff's, unfortunately...

Philip

Quote from: Junna on August 28, 2014, 04:48:25 PM
The Windows does too; and testing, I deleted the settings files. It still freezes in the same manner.

And, I assume, the debug build still does not freeze? That's really unfortunate.

Quote from: Junna on August 28, 2014, 04:48:25 PM
I don't know how to use .diff's, unfortunately...

Ah, no problem, we can use github instead; does that work for you? It's the test-13909 branch here: https://github.com/pipcet/simutrans-experimental/compare/jamespetts:way-improvements...test-13909?expand=1 I must admit I don't know what the best solution is for applying .patch/.diff files (both of which are the same format) on Windows. It might be to find patch.exe and run "patch -i <patchfile>" from the command shell. But, again, I'm happy to put things up on github if that works, or to find another solution.

Junna

Hmm, it seems I was mistaken.

It's not the debug-build that does not freeze. Either process do not freeze when the MSVC debugger is attached to the Simutrans process (debug or otherwise), but both do freeze when this is not the case, usually at roughly the same place (most frequently around 42-55 minutes into the first month).

github works well; but it appears that your change did not fix the problem. Might be it has its origin in something else. I'm a bit confused by why attaching the debugger would make it not freeze...

Philip

Quote from: Junna on August 28, 2014, 06:14:48 PM
Hmm, it seems I was mistaken.

It's not the debug-build that does not freeze. Either process do not freeze when the MSVC debugger is attached to the Simutrans process (debug or otherwise), but both do freeze when this is not the case, usually at roughly the same place (most frequently around 42-55 minutes into the first month).

That is curious. I still strongly suspect it's the new growth-related code, because of the delay and lack of reproducibility. If you still have time to do some more testing, it would be great if you could run the version I just pushed to github (again, the test-13909 branch). This patch breaks building new cities, but should work fine for loading the save file and going on from there.

Quote from: Junna on August 28, 2014, 06:14:48 PM
github works well; but it appears that your change did not fix the problem. Might be it has its origin in something else. I'm a bit confused by why attaching the debugger would make it not freeze...

That is very unfortunate. On a Linux box you could attach a debugger to the running/frozen process, but I don't know how that works in Windows.

Junna

Quote from: Philip on August 28, 2014, 06:38:39 PM
That is very unfortunate. On a Linux box you could attach a debugger to the running/frozen process, but I don't know how that works in Windows.

Hmm, I tried to get it to show something...

It crashes still. So, it must be something other than the growth code. I'm not sure why it didn't appear before your latest patch...

Non-debug build shows, in disassembly, crash at:

0000000077801658 0F 05                syscall 
-arrowhere-000000007780165A C3                   ret 
000000007780165B 0F 1F 44 00 00       nop         dword ptr [rax+rax]

// Running debug compile, where it freezes refers to this code:

void fabrik_t::step(long delta_t)
{
--arrow-markedinyellow--   if(!has_calculated_intransit_percentages)
   {
      // Can only do it here (once after loading) as paths
      // are not available when loading, even in laden_a....
      calc_max_intransit_percentages();
   }
   
   if(  delta_t==0  ) {

//Relevant, mayhap: before I experienced these, James had merged a number of things, I believe, from standard into the code, which might explain why the crashes did not appear prior, involving JIT? I think?

Philip

Quote from: Junna on August 28, 2014, 06:59:55 PM
Hmm, I tried to get it to show something...

It crashes still. So, it must be something other than the growth code. I'm not sure why it didn't appear before your latest patch...

Non-debug build shows, in disassembly, crash at:

0000000077801658 0F 05                syscall 
-arrowhere-000000007780165A C3                   ret 
000000007780165B 0F 1F 44 00 00       nop         dword ptr [rax+rax]

// Running debug compile, where it freezes refers to this code:

void fabrik_t::step(long delta_t)
{
--arrow-markedinyellow--   if(!has_calculated_intransit_percentages)
   {
      // Can only do it here (once after loading) as paths
      // are not available when loading, even in laden_a....
      calc_max_intransit_percentages();
   }
   
   if(  delta_t==0  ) {

//Relevant, mayhap: before I experienced these, James had merged a number of things, I believe, from standard into the code, which might explain why the crashes did not appear prior, involving JIT? I think?

I'm not sure it's valid to check the instruction pointer from the non-debug build against the symbols from the debug build (I can't really see a syscall being made at this address either), though I must say I'm relieved that my code is possibly exculpated.

I think the next step, apart from reviewing the code changes again, is to bisect. Can you let us know what the last version you knew to be probably free of this issue was?

Junna

#11
Quote from: Philip on August 28, 2014, 08:23:01 PM
I think the next step, apart from reviewing the code changes again, is to bisect. Can you let us know what the last version you knew to be probably free of this issue was?

Way-improvements branch as of... prior to about 2-3 weeks ago, I believe, roughly speaking, I think around 17-18th of August.

Actually, that was the debug build's crash information there, in case that wasn't clear. I found out that one had to "break all" to get the disassembly displayed after it had frozen, and that was where it had hung itself, as far as it told.

//http://www.mediafire.com/download/axlh8k9dmei13lh/Pak128.Britain-Ex-0.9.2J.rar

Pak-set. Is it possible that it has something to do with it? I made some modification to industries at some point, though I forget that it was.

Philip

#12
Quote from: Junna on August 28, 2014, 08:44:26 PM
Way-improvements branch as of... prior to about 2-3 weeks ago, I believe, roughly speaking, I think around 17-18th of August.

Actually, that was the debug build's crash information there, in case that wasn't clear. I found out that one had to "break all" to get the disassembly displayed after it had frozen, and that was where it had hung itself, as far as it told.

Is the location consistent, or different each time you run it? If you have it in the debugger now, a stack trace/backtrace would be very helpful.

Thank you again for testing so much. I still can't reproduce it here, even with your pakset, I'm afraid.

ETA: have you tried running a debug build with the "-debug 5" command line option (and probably "-log" to put the output in simu.log, as well)? It would be very interesting to know whether that keeps producing more output or stops.

Junna

Further testing with a bit more patience, and it appears to me that the freeze is not an actual total crash, but an intermittent freeze. It seems that it occurs irregularly during play but unfreezes after a while, varying from a minute to maybe 4, before resuming again. It appears to occur several times during the course of a year, however, which might have made me mistaken it for a total hang (as it would appear frozen when the application was again checked).

Philip

Quote from: Junna on August 29, 2014, 09:40:44 PM
Further testing with a bit more patience, and it appears to me that the freeze is not an actual total crash, but an intermittent freeze. It seems that it occurs irregularly during play but unfreezes after a while, varying from a minute to maybe 4, before resuming again. It appears to occur several times during the course of a year, however, which might have made me mistaken it for a total hang (as it would appear frozen when the application was again checked).

That's good news, I think! The one thing I know to sometimes cause such long delays is the factory placement code, which happens to have been modified lately as well—searching the entire map takes 3 minutes here, so that's consistent with the delays you saw. I noticed that the pak set you uploaded is version 0.9.2J, while the savegame is version 0.9.1J; maybe that's a factor and maybe it isn't. As a total stab in the dark, is it possible you defined an industry to have size 0 by x or x by 0 (or a really large in-town industry)?

Anyway, I've disabled all factory construction and pushed that to the test-13909 branch. If that still freezes for minutes it must be something else again...

Thank you for all the testing!

Junna

Quote from: Philip on August 29, 2014, 11:07:30 PM

Anyway, I've disabled all factory construction and pushed that to the test-13909 branch. If that still freezes for minutes it must be something else again...

It does not appear to happen at all, there, at least.

QuoteI noticed that the pak set you uploaded is version 0.9.2J, while the savegame is version 0.9.1J; maybe that's a factor and maybe it isn't. As a total stab in the dark, is it possible you defined an industry to have size 0 by x or x by 0 (or a really large in-town industry)?

The name change is only because I changed the folder name before rar-ing it (because it's actually 0.9.2). I considered that there were errors in the size, but I can't recall any, but I did upgrade some of the graphics and the sizes are larger for some of the industries with the new improved graphics from standard pak.britain than for older versions of -ex, and it is possible there is some error somewhere in my haphazard incorporation somewhere.

Philip

Quote from: Junna on September 02, 2014, 01:59:37 AM
It does not appear to happen at all, there, at least.

That's excellent, that should finally narrow it down. If you can find the time, it would be very interesting what running simutrans with "-debug 5 -log simu.log" produces as output around the time of the freeze. I found a minor bug in the factory generation code which might have caused the problem, though it's not obvious to me how; James already applied that change, so it's possible the bug is gone on the current way-improvements branch.

It's possible the problem is due to your PAKs, but freezing for minutes is not really an acceptable way of dealing with that, so I'd like to fix it in any case. Trying to start a game on a map without water also causes such extreme delays, but at least it should be slightly more obvious what's going on in that case.


Junna

Quote from: Philip on September 02, 2014, 08:23:38 PM
That's excellent, that should finally narrow it down. If you can find the time, it would be very interesting what running simutrans with "-debug 5 -log simu.log" produces as output around the time of the freeze. I found a minor bug in the factory generation code which might have caused the problem, though it's not obvious to me how; James already applied that change, so it's possible the bug is gone on the current way-improvements branch.

Hmm, it seems to refer to the following, in the simu.log:

Message: karte_t::reset_timer():   called, mode=$2
Message: wegbauer_t::route_fuer():   setting way type to 1025, besch=cobblestone_road, bruecke_besch=NULL, tunnel_besch=NULL
Message: wegbauer_t::route_fuer():   setting way type to 1025, besch=cobblestone_road, bruecke_besch=NULL, tunnel_besch=NULL
Message: wegbauer_t::route_fuer():   setting way type to 1025, besch=cobblestone_road, bruecke_besch=NULL, tunnel_besch=NULL
Message: wegbauer_t::route_fuer():   setting way type to 1025, besch=cobblestone_road, bruecke_besch=NULL, tunnel_besch=NULL
Message: wegbauer_t::route_fuer():   setting way type to 1025, besch=cobblestone_road, bruecke_besch=NULL, tunnel_besch=NULL
Message: wegbauer_t::route_fuer():   setting way type to 1025, besch=cobblestone_road, bruecke_besch=NULL, tunnel_besch=NULL
Message: wegbauer_t::route_fuer():   setting way type to 1025, besch=cobblestone_road, bruecke_besch=NULL, tunnel_besch=NULL
Message: wegbauer_t::route_fuer():   setting way type to 1025, besch=cobblestone_road, bruecke_besch=NULL, tunnel_besch=NULL
Message: wegbauer_t::route_fuer():   setting way type to 1025, besch=cobblestone_road, bruecke_besch=NULL, tunnel_besch=NULL
Message: wegbauer_t::route_fuer():   setting way type to 1025, besch=cobblestone_road, bruecke_besch=NULL, tunnel_besch=NULL
Message: wegbauer_t::route_fuer():   setting way type to 1025, besch=cobblestone_road, bruecke_besch=NULL, tunnel_besch=NULL
Message: wegbauer_t::route_fuer():   setting way type to 1025, besch=cobblestone_road, bruecke_besch=NULL, tunnel_besch=NULL
Message: wegbauer_t::route_fuer():   setting way type to 1025, besch=cobblestone_road, bruecke_besch=NULL, tunnel_besch=NULL
Message: wegbauer_t::route_fuer():   setting way type to 1025, besch=cobblestone_road, bruecke_besch=NULL, tunnel_besch=NULL
Message: wegbauer_t::route_fuer():   setting way type to 1025, besch=cobblestone_road, bruecke_besch=NULL, tunnel_besch=NULL
Message: wegbauer_t::route_fuer():   setting way type to 1025, besch=cobblestone_road, bruecke_besch=NULL, tunnel_besch=NULL
Message: wegbauer_t::route_fuer():   setting way type to 1025, besch=cobblestone_road, bruecke_besch=NULL, tunnel_besch=NULL
Message: wegbauer_t::route_fuer():   setting way type to 1025, besch=cobblestone_road, bruecke_besch=NULL, tunnel_besch=NULL
Message: wegbauer_t::route_fuer():   setting way type to 1025, besch=cobblestone_road, bruecke_besch=NULL, tunnel_besch=NULL
Message: wegbauer_t::route_fuer():   setting way type to 1025, besch=cobblestone_road, bruecke_besch=NULL, tunnel_besch=NULL
Message: wegbauer_t::route_fuer():   setting way type to 1025, besch=cobblestone_road, bruecke_besch=NULL, tunnel_besch=NULL
Message: wegbauer_t::route_fuer():   setting way type to 1025, besch=cobblestone_road, bruecke_besch=NULL, tunnel_besch=NULL
Message: wegbauer_t::route_fuer():   setting way type to 1025, besch=cobblestone_road, bruecke_besch=NULL, tunnel_besch=NULL
Message: wegbauer_t::route_fuer():   setting way type to 1025, besch=cobblestone_road, bruecke_besch=NULL, tunnel_besch=NULL
Message: wegbauer_t::route_fuer():   setting way type to 1025, besch=cobblestone_road, bruecke_besch=NULL, tunnel_besch=NULL
Message: wegbauer_t::route_fuer():   setting way type to 1025, besch=cobblestone_road, bruecke_besch=NULL, tunnel_besch=NULL
Message: karte_t::reset_timer():   called, mode=$0