News:

Simutrans Tools
Know our tools that can help you to create add-ons, install and customize Simutrans.

Threading fixes and refactoring

Started by ACarlotti, April 20, 2019, 03:13:49 AM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

ACarlotti

Quote from: jamespetts on April 18, 2019, 10:22:15 AMThank you: this is helpful. If you could write code to force the main thread to pause until threaded_step has finished before the save routine runs, that would be extremely helpful.

I've now written this code and put it on Github.

I've also refactored (and probably made fixes to) the thread synchronisation code for both the convoy threading and the path explorer. In doing so, I've also renamed stop_path_explorer to await_path_explorer, as I feel that better expresses its function (we're not stopping it, we're just waiting for it to stop when it's ready). Hopefully I've made the logic much clearer, compared to what was previously there (which took me a long time to understand the intent of). One particular issue in the previous code (which had been causing deadlocks that you hadn't properly fixed) was that you were checking is_terminating_threads in too many places (with the result that you weren't reaching barriers the right number of times).

Finally, I removed a comment saying that "This [convoy threads] must start after the multi-threaded path explorer starts" because that wasn't consistent with the order the threads were actually being started. However, I then discovered that the convoy threads can call await_path_explorer, which means that the order possibly is critical, and I was wrong to remove the comment. The order that the work in the threads starts was changed from a correct order in commit 7cf526 in December 2016. Do you know if the current order is also valid?

jamespetts

Thank you very much for this - this is most helpful. I have been testing this and it appears to work satisfactorily. The change of name from stop_path_explorer to await_path_explorer does make sense: thank you.

I notice that you have removed a few instances of pthread_exit(NULL); without replacing it - may I ask the reason for this?

I am aware that there had been an occasional thread deadlock, but I had not been able to fix this at least in part because I had not been able to reproduce this reliably enough; thank you for fixing this.

I must confess that I no longer recall in detail how the thread synchronisation code was intended to work; I added it in 2016 and much has happened since. I wonder whether it might have been better documented?

In relation to the comment, "This [convoy threads] must start after the multi-threaded path explorer starts", I cannot remember the reason now, but I suspect that it is for the reason that you give, viz that it can call await_path_explorer (and thus can deadlock if the path explorer has not started when this has been called). In relation to commit 7cf526, I am not aware of any quick way of searching for a commit by the fragment of its hash; are you able to link to the change on Github so that I can look at this more closely?

Thank you again for your work on this; it is much appreciated.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

ACarlotti

Quote from: jamespetts on April 20, 2019, 12:42:08 PMI notice that you have removed a few instances of pthread_exit(NULL); without replacing it - may I ask the reason for this?

"Performing a return from the start function of any thread other than the main thread results in an implicit call to pthread_exit(), using the function's return value as the thread's exit status." (From the man page)
I made the change because I think it looks a bit cleaner, and is consistent with how terminating a thread is handled in some of the drawing code in Standard.

Quote from: jamespetts on April 20, 2019, 12:42:08 PMIn relation to commit 7cf526, I am not aware of any quick way of searching for a commit by the fragment of its hash; are you able to link to the change on Github so that I can look at this more closely?
On Github, it can be found at https://github.com/jamespetts/simutrans-extended/commit/7cf526. (For reference, I'd view it in a terminal by running "git log -p 7cf256".)

If you can't find any reason against it, then I'd move the start of convoy threaded work to just after the path explorer, and add a more detailed comment about why the order matters. I don't think it would cause a deadlock, but it could cause a desync (if the order affects the outcome), and in the worst case a race condition (if await and start are running at the same time) might mean that the path explorer ends up running at exactly the times it shouldn't be running (so I guess something needs to change anyway, although the race condition could be fixed instead by using the existing mutex).

jamespetts

Thank you - that is helpful. I cannot recall the reason for changing the sequence of starting the convoy routing and the path explorer now. It may well be worthwhile in the circumstances to change the order as you suggest given the possibility of a race condition as you have identified. I imagine that testing with the modified order will reveal whether there was any significant problem with ordering it in this way, although I do not now know what that problem, if any, might have been.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

ACarlotti

I've uploaded the reordering now (along with an improved comment). I'm satisfied that these changes shouldn't introduce any new bugs, and should solve some (rare) race conditions or deadlocks.

jamespetts

Quote from: ACarlotti on April 20, 2019, 06:38:17 PM
I've uploaded the reordering now (along with an improved comment). I'm satisfied that these changes shouldn't introduce any new bugs, and should solve some (rare) race conditions or deadlocks.

Splendid, thank you for this.

Testing reveals a possible problem, although I am not completely sure that it is related to this patch. I have pushed all of your changes to the thread-refactoring branch on my Github repository.

Running fast forwarded with this (somewhat old) saved game gives a segmentation fault after a few minutes sometimes (but not consistently - perhaps about 2/3rds of the time). I do not get this on the master branch. However, on the master branch, I do get an index out of range fatal error, which I also get when I do not get a segmentation fault on the thread-refactoring branch. A segmentation fault suggests a more serious or fundamental error than an index out of range error, so this may well be a problem with this code, although I am not sure where it arises as attempting to run gdb made my portable computer that I am currently using run out of memory.

It may be worthwhile you having a look at this and seeing whether there is any problem with these updates before I put this out on the master branch.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

jamespetts

Thank you very much for testing. These errors seem unrelated to your patch, and I will have to investigate these when I get home. However, I infer that you have not been able to reproduce the segmentation fault that I sometimes get?
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

ACarlotti

Quote from: jamespetts on April 20, 2019, 10:01:08 PMHowever, I infer that you have not been able to reproduce the segmentation fault that I sometimes get?
No, I have not. I can't rule out the possibility of a segmentation fault that is somehow specific to Windows or some other property of your system, but that too might be specific to this save.

I suggest giving my previous post its own thread (as a self-contained bug report).

jamespetts

Thank you for confirming. Given that you have not been able to reproduce this, I have pushed your commits to the master branch and split your post above to its own bug report as suggested, which I will look into when I get time once I get back home.

Thank you very much for your work on this: it is much appreciated.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

jamespetts

I have tracked down the cause of this issue to a change made in this patch, in particular, the removal of this line of code:


path_explorer_t::allow_path_explorer_on_this_thread = true;


from void* path_explorer_threaded(void* args). Did you mean not to remove this line, or did you mean to remove the whole mechanism for checking allow_path_explorer_on_this_thread?
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

ACarlotti

Oops - please reinstate that. I basically rewrote the function from scratch, and forgot to include that bit in the new version (it should go at the top again, of course).

jamespetts

Quote from: ACarlotti on April 22, 2019, 11:34:23 PM
Oops - please reinstate that. I basically rewrote the function from scratch, and forgot to include that bit in the new version (it should go at the top again, of course).

Splendid, thank you for the clarification. I had provisionally done this, but wanted to know whether you had intended something else by it. The fixed version will be to-morrow's nightly build.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.