Here is a patch that changes the data layout of convoi_t to be more cache-friendly for calls to sync_step and step: the most important data is moved to the front of the structure. Then most of them sits in one cache-line.
gperftools says that this improves the perfomance of these two methods.
Submit?
Yes, please. It will depend on the compiler, but even if it only improved GCC it is worth submitting.
The compilers have to conform to the same ABI in order to use the same system libraries. So GCC does things differently when compiling for Windows and Linux. One difference has to do with bit-fields. So it might be more platform dependent in practice. (Even more so for different hardware, but that is a bigger challenge to overcome.)
Again, please submit.
It is, see r9028
Ups, thanks.