Started by ceeac, May 06, 2022, 09:35:29 AM
0 Members and 1 Guest are viewing this topic.
// padding to increase the from 32 to 40 bytes for sync_step performance reasons.
Quote from: ceeac on May 06, 2022, 09:35:29 AMAs a result of the improved cache hit rate, for me sync_step is now about 5% faster for the big pak128 map and more than 25% faster for the yoshi map.
Quote from: ceeac on May 06, 2022, 09:35:29 AMfor me sync_step is now about 5% faster for the big pak128 map and more than 25% faster for the yoshi map.
Quote from: ceeac on May 07, 2022, 06:23:28 PMi7 6700k (4GHz), 16 GiB RAM (3200 MT/s)I used a standard Release build (i.e. CMAKE_BUILD_TYPE=Release) without LTO.
Quote from: ceeac on May 07, 2022, 07:57:25 AMI think this is because objlist_t::grow_capacity allocates 4 pointers (32 bytes total) for 2, 3 or 4 objects on a tile so that the memory is mixed in freelist if wolke_t is also 32 bytes. I think there are not many freelist allocations of 40 bytes besides wolke_t so the memory is nicely packed together.
Quote from: TurfIt on May 11, 2022, 02:10:03 AMNewer computer (5950X) sees 16% with yoshi, 4% on 12301. Interestingly the 5950 outperforms the 6700 by 40+% on yoshi, but the 6700 beats it on the big map (by like 0.01%, but still...). When the data fits the big cache, new cpus can fly!
Quote from: DrSuperGood on May 11, 2022, 06:55:18 PMThis will either be due to effective memory latency or some sort of tiny regression with multi threading.
Quote from: prissi on May 12, 2022, 02:33:22 PMMy first attempt with a template using only static members and fuctions still inreased the size of structures due to the zero member rule. So one has to use a static freelist_tpl member inside the actual class.
Quote from: prissi on May 12, 2022, 02:33:22 PMAlso to TurfIt ideas of reordering the object is sync lsit due to memory closeness: I think this is doomed for deterministic network gamies since the memory proximity may depend on the heap allocation which is beyond the control of simutrans on different architetures.
Quote from: Yona-TYT on May 15, 2022, 02:18:05 PMI apologize for asking a silly question.Is it possible that this finds its way to hardware acceleration? Or is it still a distant dream?.
Quote from: Ters on July 05, 2022, 03:50:20 PMI bit late answer as I haven't been reading this forum for a while:The main obstacle for GPU rendering in Simutrans is that there is no clear separation between dynamic objects and static objects. Having the CPU prepare the draw calls and then sending them to the GPU simply takes more time than the CPU just drawing everything itself and sending the result. Especially when zoomed out. Changing textures is also expensive and cramming all the images into a texture atlas that would fit on low-end hardware was difficult even for pak64 with the old limit of 64k images. Maybe minimum maximum texture size has gone up. I haven't check that lately either.Running the simulation on GPU doesn't seem like a great idea either. GPUs are made for crunching numbers. Particularly doing the same operations on many numbers in parallel. Simutrans' logic contains a lot of ifs, and so isn't as suited for GPUs. (The first GPUs didn't even support ifs, I think, but that is quite some time ago now.)
Quote from: Ters on July 05, 2022, 03:50:20 PMRunning the simulation on GPU