[Code v3.9] Regarding Revised Path Search Implementation

knightly · May 18, 2009, 04:03:43 PM

James,

Since you are going to release a new version, here are some points for you to consider :

(1)
As you have already set current_node to NULL at the end of each iteration, it is not necessary to delete current_node at the start of each iteration :

Quote
   while(open_list[category].get_count() > 0)
   {
      delete current_node;

      current_node = open_list[category].pop();

      if(!current_node->halt.is_bound() || paths[category].get(current_node->halt) != NULL)
      {
         // Only insert into the open list if the
         // item is not already on the closed list,
         // and the halt has not been deleted since
         // being added to the open list.
         continue;
      }

Instead, this statement should be relocated inside the IF body above, right before continue, when a popped node is found in closed list.

(2)
As I have said before, apart from separate open_lists for different ware categories, you also need a set of max_iterations, iterations and search_complete for different ware types, otherwise path search for different ware types will interfere with one another and get messed up.

(3)
I have told you before that the loop where you back-trace the next transfer halt is *not* the relaxation phase, but you have labelled this loop as the relaxation phase with a comment. The relaxation phase should be here :

Quote
      while(iter.next() && iterations < max_iterations)
      {
         next_halt = iter.get_current_key();
         if(paths[category].get(next_halt) != NULL)
         {
            // Only insert into the open list if the
            // item is not already on the closed list.
            continue;
         }

         current_connexion = iter.get_current_value();

         if(previous_best_line == current_connexion->best_line && previous_best_convoy == current_connexion->best_convoy)
         {
            continue;
         }
         new_node = new path;
         new_node->halt = next_halt;
         new_node->journey_time = current_connexion->journey_time + current_connexion->waiting_time + current_node->journey_time;
         new_node->link = current_node;

         open_list[category].insert(new_node);

      }

I have double-checked in an MIT algorithm textbook, and if you are still unconvinced, please check again at Wikipedia for Dijkstra's Algorithm, and look for the line of pseude-code with the comment Relax (u,v,a) to its right.

(4)
Recycled nodes (both path nodes and connexion nodes) are not deleted (here doesn't mean recycle, but deallocation of memory) in ~haltestelle().

(5)
I see that you have the following code for clearing open_list :

Quote
void haltestelle_t::flush_open_list(uint8 category)
{
   while(!open_list[category].empty())
   {
      delete open_list[category].pop();
   }
}

The method pop() of binary heap is rather slow. I would suggest to add an iterator for binary heap, so that you can easily loop through all nodes and delete them.

(6)
Finally, in the code for overloading operator new, it seems that nodes are still dynamically allocated one by one. The only difference is that they are created in one function call :

Quote
void* haltestelle_t::path::operator new(size_t size)
{

   if(head_path == NULL)
   {
      // Create new nodes if there are none left.
      if(!first_run_path)
      {
         for(uint16 i = 0; i < chunk_quantity; i ++)
         {
            void *p = malloc(size);
            if(p == NULL)
            {
               // Something has gone very wrong.
               dbg->fatal("simhalt.cc", "Error in allocating new memory for paths");
            }
            path* tmp = (path*) p;
            path_pool_push(tmp);
         }
      }
      else
      {
         first_run_path = false;
         //TODO: Refine this number
         for(uint16 i = 0; i < 1024; i ++)
         {
            void *p = malloc(size);
            if(p == NULL)
            {
               // Something has gone very wrong.
               dbg->fatal("simhalt.cc", "Error in allocating new memory for paths");
            }
            path* tmp = (path*) p;
            path_pool_push(tmp);
         }
      }
   }
   return path_pool_pop();
}

My suggestion is :
(a) first determine chunk size and save it to a variable (either 64 or 1024) [currently, code for both cases are largely the same, so we can factor out common code]
(b) then create an array based on the determined chunk size (you may use the new operator in the global scope, e.g. path_node *temp = ::new path_node[chunk_size])
(c) chain the nodes up (node[ i ]->link = node[ i + 1 ]) and assign to head node (path_head or connexion_head)
(d) store the array pointer in a chunk_node, which consists of a link and a pointer to path_node/connexion_node array, and manage chunk_nodes in the same way as recycled nodes
(e) when halt is destroyed, traverse the chunk_nodes, and delete[] all arrays

P.S. Originally I want to write the code myself for the above points, but unfortunately after many trials I still haven't figured out how to work with GIT locally. I have forked a branch from v3.8 code, but don't know how to merge new changes from v3.9 locally, so I can't start writing code based on v3.9.

Edit : Point (6)(c) above

jamespetts · May 18, 2009, 06:43:14 PM

Knightly,

thank you very much for your continued efforts - they are much appreciated :-) A few responses follow below.

1. Hmm, won't that have the same effect? When the continue statement is executed, the code returns to the top of the while loop, and the node is deleted. The alternative would be to put the delete statement next to all the instances of continue, which would not be efficient code.

2. Oops - forgot about those! I will put that in 3.10.

3. Oops - sorry. Comment will be removed for 3.10 :-)

4. The pool is static, so nodes in the pool should not be deleted in the destructor for individual halt objects.

5. I'll have to consider that. An alternative might be a method in the binary heap itself to delete all objects.

6. From what I understand of how C++ is translated into machine code, creating an array of objects, as in path* mypath[32] = new path is shorthand for a loop in which a path object is created, which is executed 32 times, and the pointers assigned to the array mypath. There is no magical way of creating lots of objects all at once as far as the low-level machine code is concerned. The problem with scattering path* path_right_now = new path and delete path_right_now throughout the code is that it leads to memory fragmentation. The memory fragmentation is caused, not by the fact that the statement new is executed lots of times in the source code, but rather because the new objects are created haphazardly, at different times, so the operating system has to assign them little bits of memory here and there, in between which times, other parts of that programme, or other programmes, might well have requested bits of memory, so that the memory taken by the object in question will be spread all over the place. The trick is, therefore, to put as many calls to new as sequentially as possible (and hope that the loop is not pre-empted by somebody opening a new tab in Firefox at the same time), so as to allocate one big chunk of memory all in one go, then keep it, rather than keep taking memory from the operating system's free store in little bits and giving it back in little bits. So, using an array is not necessary: it would be more work for no better result: the effect would be the same.

7. As to Git, do you have Git-GUI? What you need to do is go to remote > fetch from > origin. You should then be able to download the changes from the main Experimental branch. After that, you need to merge them: go to merge > local merge. Click "visualise" (bottom left) first to see the changes so that you can understand where conflicts might arise and resolve them easily by seeing what is intended, then, keeping the visualise window open, go back to the merge dialogue box, and click "merge". The changes should then be merged into the code.

prissi · May 18, 2009, 08:01:03 PM

Simutrans has a quite effective own memory allocation routine, see freelist. This is as fast as many commercial memory routines and fragmentation is not an issue there (since it uses lists with different sizes seperately). This is for instance used for new baum_t and speeds up map creation by 30%.

jamespetts · May 18, 2009, 08:15:12 PM

Prissi,

aha, thank you for the tip! How would I interface my path and connexion structs to use it in the pathfinding?

knightly · May 19, 2009, 04:47:17 AM

James,

(1) In calculate_paths(), there are only 3 continue statements. Apart from the one I mentioned above, the other 2 happens without creating new nodes, so no node has to be deleted in those cases. By relocating the delete statement as I said, you don't need to delete NULL pointers for many cases.

(4) Sorry that I overlooked that. I must be blind.

(5) I also thought about adding a method in binary heap for deleting (recycling) those nodes, but the problem is, binary heaps may not be working on pointers -- you can use the binary heap template on objects as well.

(6) Many thanks for your detailed explanation of the inner mechanics of dynamic memory allocation. I read an article online, which just happened to be an implementation of freelist, and the author also suggests allocating memory for variables in bulk.

I did not say that there is a magical way to create many objects all at once, because each new object has to have its constructor invoked. But when you call malloc() repeatedly for each new node, the system need to search for the appropriate free location as many times as you invoke malloc(), and such searches may be expensive; but if you call new type[size], the memory location search will be done only once, and new objects are constructed and appended one by one as they are created. I did not look into the machine code, but I think C++ is intelligent enough to initiate search once and reserve a memory block for the whole array, instead of performing location search and memory allocation for each element in the array. This is highly likely, as array elements must always be consecutive in sequence -- if memory is, as you said, allocated individually for each array element, then there will be the possibility of breaking the sequence by any intervening request for memory. Another evidence is that, if you overload the new operator for arrays (i.e. operater new[]), there is only one size_t parameter for the total size of array, so memory should be allocated in one piece and distributed to each array element. A final evidence is that, when I tried to delete (not delete[]) a single middle element in an array, the system hangs, probably because there is no record of memory allocation for each individual array element. I guess allocating arrays is roughly like calling malloc(element_size*count) first and then invoking constructor on each array element (this is probably what you observed in the machine code), finally returning pointer to the first element. Besides, we also need to consider the overhead of repeatedly calling and returning from the same function -- malloc().

Edit 1 : I have taken a look at freelist.cc, and it allocates memory in chunks using xmalloc(num_elements * size + sizeof(p)), similar to what I said above.

Edit 2 : If you are still unconvinced about what I said above, please take a look at the example in <a href="http://pesona.mmu.edu.my/~ypwong/yr1999sem1/tcp1311oop/OverloadingNew&Delete.PDF">this link</a>, and focus on the output of the case "tesing p4". You can see that operator new[] is called only once and memory is allocated once for the whole array, but constructors are repeatedly invoked on each array element.

jamespetts · May 19, 2009, 07:16:33 AM

What I have done for 3.10 is to replace the malloc statement with a call to Simutrans's built-in freelist class, which should help to avoid the issues mentioned. What I am wondering, however, is whether it is worth replacing the memory pool entirely with calls to freelist in the overloaded new and delete operators.

knightly · May 19, 2009, 10:50:38 AM

It seems to me that freelist's functionality is the same as the current path and connexion pools. So, probably, you can use freelist alone. Duplicated efforts in pooling will only decrease performance imho.

Edit : If you choose to use freelist only, you can remove the link variable from connexion, saving some memory.

jamespetts · May 19, 2009, 06:02:52 PM

I have tested using just freelist and using my implemented memory pool, and freelist seems faster, so that will be in the next version. Thank you, Prissi, for that advice!

As to the suggestion of moving the delete statement, I tried it, but the game crashed immediately on being started, so I put it back where it was.

knightly · May 20, 2009, 06:07:14 AM

Quote from: jamespetts on May 19, 2009, 06:02:52 PM
As to the suggestion of moving the delete statement, I tried it, but the game crashed immediately on being started, so I put it back where it was.

James, debugging is not trial-and-error. If your code crashes upon loading a save game, that doesn't mean that the crash is caused by my suggestion.

I downloaded your newest code and ran the code in debug mode (GDI version). When I loaded Wipi's save game first and fast-forwarded it for a while, and afterwards loaded my old save game which I sent you some time ago, the game encountered an access violation problem (pls see attached image). I did not modify your code before testing, so this problem has nothing to do with my suggestion. I guess this is the same problem as you experienced. This problem is weird, because it cannot be reproduced in Release version.

I further test my own suggestion by compiling a release version, and I experience *no* crash upon loading various save games, including those from ST STD and those from different versions of ST EXP. So, please don't blame the crash on my suggestion. If you don't believe me, try it yourself.

Now I can see that you add another delete current_node at the end of calculate_paths() :

Quote
      current_node = NULL;
   }

   // If the code has reached here without returning, the search is complete.
   search_complete[category] = true;
   delete current_node;
}

I know you added the blue line because if the last examined node from open_list is in closed_list and the loop ends, this last node need to be deleted/recycled. However, if you delete it right away as I suggested, you don't need to add such unnecessary code.

In fact, even the green line above is superfluous, because current_node will be reset to a newly popped node from open_list in the next iteration. You added this line just because you have a delete current_node at the start of the iteration. So, you add this unnecessary code because you place the delete statement at the wrong location in the first place.

I will say no more on this. The issue is not complicated at all. I am pretty tired of repeating the same suggestions over and over again. IMHO you are adding unnecessary code and let your code perform unnecessary actions, because you are not entirely familiar with the logic flow of the code that you wrote. Sometimes I got this same impression that you are not entirely clear about the logic flow of your own code when I spotted the other bugs which I have found so far.

jamespetts · May 20, 2009, 08:17:11 AM

Knigthly,

the issue with Wipi's save game is well known to me: closing that saved game always causes an access violation crash for me, both with Simutrans-Standard and Simutrans-Experimental. I did not use that game when I had the crash when I tested your suggestion: I used a game called Ferndale, which I normally use for debugging. I tried using that game with the original code, and it worked fine. I then changed the code, as you suggested, and, on loading, the game crashed (not during the load routine, but a second or two after loading). I then put the code back to what it was, and, on loading, the game worked fine again. It was the exact same game used every time, and it was the first saved game to be loaded after the program's start. I had tried the same thing previously, with the same result.

I am intrigued, however, that you have managed to make it work - I was not able to find out exactly why it was crashing there (I did try, so that I could fix the problem and use your code instead of the original code, but was unsuccessful), as the access violations appeared to be generated in what seemed to be an unrelated part of the code (I cannot remember where exactly now). However, they never occurred when using my original code, and always occurred when implementing your suggestion. It might be that there is some interaction between your suggested code and the Ferndale saved game that causes a problem, but, if that is so, there might be similar interactions with many other permutations. Since I have not been able to find the cause of the problem, and since it is better to have slightly inefficient code than unstable code, I have left it as it is for the time being.

(The alternative is that I have not understood your suggestion properly: perhaps if you could make a Git branch with your suggestion in it, I could get your exact suggested code?)

knightly · May 20, 2009, 08:53:40 AM

James,

I still don't know how to use Git branching, so let's forget about it for the time being.

I will simply reproduce the relevant code section here : (I have commented out the code which I regard as unnecessary)

Quote
   while(open_list[category].get_count() > 0)
   {
//      delete current_node;
      current_node = open_list[category].pop();

      if(!current_node->halt.is_bound() || paths[category].get(current_node->halt) != NULL)
      {
         // Only insert into the open list if the
         // item is not already on the closed list,
         // and the halt has not been deleted since
         // being added to the open list.
         delete current_node;
         continue;
      }

      current_connexions = current_node->halt->get_connexions(category);
      quickstone_hashtable_iterator_tpl<haltestelle_t, connexion*> iter(*current_connexions);
      linehandle_t previous_best_line;
      convoihandle_t previous_best_convoy;
      if(current_node->link != NULL)
      {
         previous_connexion = current_node->link->halt->get_connexions(category)->get(current_node->halt);
         if(previous_connexion != NULL)
         {
            previous_best_line = previous_connexion->best_line;
            previous_best_convoy = previous_connexion->best_convoy;
         }
      }

      // If max_iterations is zero, max_transfers will have been zero, which is a flag
      // to conduct an unlimited depth search.
      while(iter.next() && iterations[category] < max_iterations && max_iterations != 0)
      {
         next_halt = iter.get_current_key();
         if(paths[category].get(next_halt) != NULL)
         {
            // Only insert into the open list if the
            // item is not already on the closed list.
            continue;
         }

         current_connexion = iter.get_current_value();

         if(previous_best_line == current_connexion->best_line && previous_best_convoy == current_connexion->best_convoy)
         {
            //delete current_node;
            continue;
         }
         iterations[category]++;
         new_node = new path;
         new_node->halt = next_halt;
         new_node->journey_time = current_connexion->journey_time + current_connexion->waiting_time + current_node->journey_time;
         new_node->link = current_node;

         open_list[category].insert(new_node);

      }

      // Add only if not already contained and it is not the head node.
      if(current_node->link != NULL && paths[category].put(current_node->halt, current_node))
      {
         // Track the path back to get the next transfer from this halt
         path* track_node = current_node;
         for(uint8 depth = 0; depth <= 255; depth ++)
         {
            if(track_node->halt.is_bound())
            {

               if(track_node->link->link == NULL)
               {
                  path* tmp = paths[category].get(current_node->halt);
                  tmp->halt = track_node->halt;
                  break;
               }
            }

            track_node = track_node->link;
         }
         if(!paths[category].get(current_node->halt)->halt.is_bound())
         {
            // Remove bad paths (transfer depth too great, so aborted)
            delete paths[category].remove(current_node->halt);
         }
      }

      INT_CHECK( "simhalt 1694" );

      if(current_node->halt == goal)
      {
         // Abort the search early if the goal stop is found.
         // Because the open list is stored on the heap, the search
         // can be resumed where it left off if another goal, as yet
         // not found, is searched for, unless the index is stale.

         if(open_list[category].empty())
         {
            search_complete[category] = true;
         }
         return;
      }

      //current_node = NULL;
   }

   // If the code has reached here without returning, the search is complete.
   search_complete[category] = true;
   //delete current_node;
}

I think you haven't misunderstood my suggestion. I strongly suggest you to try compiling a Release version according to my suggestion and see if you experience any crash.

As for your Ferndale save game, can you upload it for me to test? Which pakset does it use and from what version of Simutrans Standard was it saved?

Edit :

Okay, I have found the Ferndale game <a href="http://forum.simutrans.com/index.php?topic=1012.0">here</a>. I have tested it against the GDI release version which is based on my suggestion, and the game ran fine for over a year without a crash. There was no crash even if I reloaded it again, or loaded another save game.

I have even tried it in Debug (GDI) mode, using my suggestion, and the game loaded successfully without a crash, and there was no crash for over a whole year.

Therefore, obviously the crashes you encountered have nothing to do with my suggestion.

jamespetts · May 20, 2009, 11:30:07 AM

Knightly,

thank you for posting that code extract. I suspect that the problem that I had came from the fact that I did not comment out that very last delete statement, therefore causing the same memory to be re-pooled twice in some cases, which would cause instability. I will have another go when I get home, and hopefully it will work this time. Thank you very much for your effort :-)

Edit: I have just tested it - it works fine with that extra delete removed. Your suggested code will be in the next version. Sorry for the confusion. Thank you for your help :-)

News:

[Code v3.9] Regarding Revised Path Search Implementation

knightly

jamespetts

prissi

jamespetts

knightly

jamespetts

knightly

jamespetts

knightly

jamespetts

knightly

jamespetts