News:

Do you need help?
Simutrans Wiki Manual can help you to play and extend Simutrans. In 9 languages.

Multithreaded Simutrans (GDI) flickers often and freezes sometimes in fast forwa

Started by prissi, April 14, 2014, 10:20:29 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

prissi

During the development and extensive playtesting I got several freezes of Simutrans during fast forward. Especially after some time in Background and then bringing to front seems to cause a complete hanging. This is not easily reproducible but happens annoyingly often. Any ideas appreciated.

Also during fast forward I see from time to time a flicker and in the taskbar a new Icon apopears for this frame. So the windows taskbar jumps. I think the latter is due to some task running when it should not, and does not always happens.

Compiled with multithreading on MSVC. I will try the single threaded versions.

Ters

If you have trouble attaching the debugger to look at the state of things, you can try to look at the theads and their call stacks using Process Explorer. Since you comple with MSVC, it might even be able to read the symbols file to give you proper source code references (I haven't tried).

TurfIt

I can duplicate a FF/background freeze with MinGW/SDL and multithreaded. Load the yoshi game, hit FF, put into background, and it freezes in < 10sec. No need to bring to foreground. But I see no flicker. And annoyingly it runs with no crash all day long under gdb.

EDIT: and even more annoyingly, after trying to troubleshoot, it doesn't crash at all.

TurfIt

Finally traced my crash to r6592. Specifically:

  * Check if need to play new MIDI
+ * Max Kielland:
+ * Made it possible to get next song
+ * even if we are muted.
  */
void check_midi()
{
- if(midi_get_mute()) {
- return;
- }

Strange that it only crashes when fast forwarding in the background with MIDI muted, but when calling deprecated windows APIs...

Reverting this solves the crash I was experiencing, maybe yours too?
Why would one want it possible to 'get next song even if we are muted?'

Ters

This is shotgun debugging. It is very unlikely that the quoted code has anything to do with the bug beyond a butterfly effect. Is it impossible to reproduce the crash with debugging symbols intact so that you can get readable stacktraces out of it?

prissi


TurfIt

Quote from: Ters on April 21, 2014, 09:11:32 PM
This is shotgun debugging. It is very unlikely that the quoted code has anything to do with the bug beyond a butterfly effect. Is it impossible to reproduce the crash with debugging symbols intact so that you can get readable stacktraces out of it?
B.S. The code is responsible for bad MCI commands being sent. MCIERR_INVALID_DEVICE_NAME is being returned but ignored, and it seems after enough of them, Windows decides Simutrans isn't responding and stop updating the screen. Note: the game is still running, time is passing, the world is being updated, but Windows has stuck it's head in and said not responding. So, there's no stack trace to read as there's no crash. And it remains responding if you run it under a debugger. Unless you want to debug Windows itself, not sending the MCI commands when the MIDI is not playing is the only fix I've found.

This is 100% repeatable: run fast forward, mute the midi, put the window in the background, --> not responding.
Compiler settings don't matter, can be multithreaded or not, DEBUG, or not, SDL or GDI, whatever. But it cannot be run in a debugger.

Whether this is the problem prissi originally had, I don't know.

Ters

Quote from: TurfIt on April 22, 2014, 12:24:03 AM
So, there's no stack trace to read as there's no crash.

There is always a stack to trace.

Quote from: TurfIt on April 22, 2014, 12:24:03 AM
This is 100% repeatable: run fast forward, mute the midi, put the window in the background, --> not responding.
Compiler settings don't matter, can be multithreaded or not, DEBUG, or not, SDL or GDI, whatever. But it cannot be run in a debugger.

I can reproduce this. Simutrans does continue to run, it's just the GUI that's frozen. When attaching the debugger to the non-responsive process, it reverts to being responsive once I continue execution (attachment causes execution to stop as if it hit a breakpoint).

That a program's GUI should become unresponsive just because it sends some invalid MCI commands while in the background doesn't make any sense. There is some link that is missing. As far as I know, Windows labels a window as not responding if it does not process messages within some time limit. I therefore suspect that the root cause here is that in fast forward mode, Simutrans doesn't keep up with message processing. The misuse of the MCI API may trigger this by generating a (possibly small) excess of messages. I haven't been able to confirm this theory yet, since the message queue apparently is a shy little creature. If correct, both errors should be fixed.

Ters

Looking around on the web, it looks that since Windows 7 (or maybe Vista), PeekMessage is no longer enough to prevent Windows from labeling an application as not responding. When in the background, Simutrans gets no messages except perhaps notifications from MCI when playing music. As long at Simutrans gets messages, it calls GetMessage, keeping Windows happy. Once a window is marked as non-responsive, it is replaced by a ghost window, which likely explains why bringing Simutrans back to the front doesn't get the messages flowing (it isn't Simutrans, just a fake). It also explains why Simutrans appears frozen: the rendering happens on the hidden real window. When debugging, this ghost window is disabled, which explains why Simutrans does not freeze when debugged.

What is still a mystery to me is what r6592 has to do with this. I also have a hard time accepting that calling PeekMessage, even with remove, isn't enough to tell Windows that the application is running properly. All mentions I have found of it somewhat tangential.

prissi

Sorry, I write lots of console win32 api program, and those were never considered "unresponsive" despite nil Message processing but still calling Shell functions. Maybe the problem stems from hiding the console window?

When simutrans was flickering, each flickering was acompanied by some other program coming up in the taskbar (windows7) but only for one fram or so. So I could not see what it was.

EDIT: Introducing and processing a timer every 1111 ms seems to fix the hung up problem for me. Still this is extremely hackish.

TurfIt

Extremely hackish, and doesn't fix things for SDL. Not calling the MCI functions seems best to me, I still can't see why one would want to 'get next song even if we are muted?'
Even with the console shown, it still does it.
Also, I went to check if SDL2 was affected, and discovered it doesn't work at all anymore, just flashes a window then exits with SIGTRAP. WTF...

Ters

Quote from: prissi on April 22, 2014, 08:19:26 PM
Sorry, I write lots of console win32 api program, and those were never considered "unresponsive" despite nil Message processing but still calling Shell functions. Maybe the problem stems from hiding the console window?

Console programs are something else, although you can have both a message pump and a console window. Once you do CreateWindow, you have to keep a message pump going. But that's not exactly the problem, because Simutrans does keep it's message pump going. It's just that Windows thinks it isn't.

Quote from: prissi on April 22, 2014, 08:19:26 PM
When simutrans was flickering, each flickering was acompanied by some other program coming up in the taskbar (windows7) but only for one fram or so. So I could not see what it was.

I think this is Windows replacing Simutrans' real window with the ghost window. I have seen similar cases of duplicate taskbar buttons, or even none at all, when other programs go unresposive. As for the ghost window, calling DisableProcessWindowsGhosting prevents Simutrans from locking up. Unfortunately it is only available in Win XP and later, and it is not present in mingw32's headers and import libraries. More importantly, it is a stab at the symptom, and not the underlying issue.

Quote from: TurfIt on April 23, 2014, 12:00:42 AM
Extremely hackish, and doesn't fix things for SDL.

It is interesting that SDL has the same problem. Either SDL's MIDI support is a very thin layer, or this problem is extremely easy to trigger.

Quote from: TurfIt on April 23, 2014, 12:00:42 AM
Not calling the MCI functions seems best to me, I still can't see why one would want to 'get next song even if we are muted?'

I agree that going about calling lots of functions that are bound to fail isn't good, and should be fixed in some way. My point is that with no explained connection between MCI and the triggering of Windows' marking of a window as non-responsive, the problem of Simutrans getting stuck behind a ghost window may resurface at any time. Other developers seems to have run into a similar problem, and I find it unlikely that they also (ab)use MCI.

prissi

You can set up a timer for sdl as well ... maybe this is what the midi actually does, if it plays ...

Ters

I think it is much better to rewrite the MIDI code so that it doesn't issue MCI commands, yet still perform the logic necessary for whatever r6592 intended. Even if it's a occult way of exorcising the lock-up problem, it serves a purpose in itself, unlike starting a timer (which even if less "occult", is just as "magical").

prissi

Unfourtunately there is not a better mid implementation as far as I know. At least Microsoft keep say this: http://msdn.microsoft.com/en-us/library/windows/desktop/dd742875(v=vs.85).aspx

But for mute midi at the start, no calls are needed whatsoever.

Ters

Quote from: prissi on April 23, 2014, 02:49:43 PM
Unfourtunately there is not a better mid implementation as far as I know. At least Microsoft keep say this: http://msdn.microsoft.com/en-us/library/windows/desktop/dd742875(v=vs.85).aspx

But for mute midi at the start, no calls are needed whatsoever.

I was talking about the implementation in w32_midi.cc, or even simsound.cc. The implementation in sdl_midi.cc is somewhat better, but not for dr_midi_pos() possibly. Simutrans knows it isn't playing music, so no need to call the underlying stuff. This check was in simsound.cc earlier, but was removed for some reason. It should be possible to find some middle ground between what was done before r6592 and what is done after.

Update:
Looking at the SDL source code, it uses a different API for playing MIDI music. Simutrans does however not appear to use SDL for playing MIDI, although there is a source file for it.

Update2:
Another random change that makes the problem go away: Install a wndproc hook on Simutrans' main thread. It appears that this dislodges periodic WM_GETICON (!) messages.

prissi


Ters

Quote from: prissi on April 24, 2014, 10:40:08 PM
The WndProc is already bound to the main thread ...

Which is why I bound the WndProc hook to that thread. I didn't want to hook all WndProcs on my system.

Ters

Here is a patch based on r7160 that avoids calls to the underlying MIDI subsystem if music is muted (at least in check_midi), yet should still allow getting next song even if muted (although I haven't actually tested that functionality). I no longer get hangs with GDI or SDL backend, even without other hacks, although I consider that more of a side-effect even if it was the original motivation.