News:

Use the "Forum Search"
It may help you to find anything in the forum ;).

Core performance improvements

Started by mkrnic, November 24, 2020, 10:41:21 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

mkrnic

After playing simutrans on and off for more than 10 years and planning to contribute in some way but not having real ideas how, finally I've reached that point.

I believe I've read most of the relevant topics for my plan, mainly Hw acceleration/simutrans 3D (2010) and OpenGL (and others which I've forgotten names now).
What I find most reassuring is how supportive the community is about these big plans for game improvement by us "newcomers".

Most topics end with the same conclusion (which is reasonable since mostly the same people discuss :)), and that is the need to decouple game logic from rendering (and framerate), and that is actually what prompted me to finally move in this direction. I have pretty beefy PC which struggles with simutrans, and while I'm more or less ok with simulation being slow, I'd like at least to be able to scroll the map smoothly.

As most of developers, I'm also moved primarily with the desire to add new and shiny features and not with maintaining legacy code. However, I believe this would be a big step in the right direction as it would (hopefully) allow new features to be added more easily - including my own.

I've spent last few weeks studying the code, and I'm fully aware that the chance for succeeding with the idea is slim, but that is what makes it even more interesting.

After phase 1 is done, I'd like to leverage hardware acceleration, probably using Vulkan.

I'm not sure if there is actually a question in this (of course I'm open to suggestions), but if I tell someone except myself, then I "have" to pursue that goal. So, one big thank you to all developers until now, and I hope to join you.

After the described has been finished I'll share more megalomaniacal ideas :D.


jamespetts

Welcome! That is an ambitious project indeed. What I anticipate that you may find is that an unknown number of parts of the code depend in some undocumented and unpredictable way on an unknown number of aspects of the current graphical implementation such that it is difficult to make this change without rewriting quite fundamentally a surprisingly large proportion of the whole codebase.

However, if you are really up for the great challenge of this project, then offloading the graphical work onto a GPU would definitely represent a huge improvement in the responsiveness and performance of the game and would be most worthwhile. I certainly approve of doing this using the open-source Vulkan engine.

One important thing is to make sure that the resulting work is fully compatible with all existing paksets, as the resources to re-code all paksets in true 3d do not exist; however, simple 2d acceleration would certainly be an improvement on what we already have.

Very best wishes with this. I am afraid that there is only very limited technical assistance that I can give to this project as I have never really looked into, modified or understood the graphics code, but I will offer such information as I may have in relation to a specific issue if that would be helpful. I have no doubt that there are a large number of willing testers at your disposal on this forum, me included, if and when you have a working test build for this.

If you find that this project is not realistically achievable, do not feel disheartened, as you have picked perhaps one of the most difficult projects in one of the most difficult to maintain legacy codebases that one could find. If you do succeed in producing a reliable, maintainable graphically accelerated version of Simutrans, you will no doubt go down in Simutrans history as a legendary contributor.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

prissi

#3
There is a branch in the svn below trunk (svn://servers.simutrans.org/simutrans/branches/simutrans3d), which contains an almost fully working OpenGL 3D implementation which used a mix of 2D and 3D models. It could display something, but the the lack of 3D model and incompabilities led to rather multithread the display code.

The last revision was r6127, so Nov 2012 ...

EDIT: Due to a change in handling the delete constructor, gcc cannot compile this any more.

mkrnic

Thank you for the support and for the svn link. It might prove useful at a later stage, but right now, I'll try just to separate rendering logic and ignore GPU completely.

Right now I'm just wondering is there any technical reason to keep option for using only single thread. How I imagined implementing this, at least two threads are necessary.

I'm sure I'll have more questions along the way.

jamespetts

Quote from: Mak on November 25, 2020, 02:10:22 PM
Thank you for the support and for the svn link. It might prove useful at a later stage, but right now, I'll try just to separate rendering logic and ignore GPU completely.

Right now I'm just wondering is there any technical reason to keep option for using only single thread. How I imagined implementing this, at least two threads are necessary.

I'm sure I'll have more questions along the way.

Having an option for a single thread can be very useful when debugging threading specific problems, although providing that the multi-threaded simulation code (as opposed to graphics code) can be compiled as single threaded to diagnose losses of synchronisation with the server caused by multi-threading, it may be worth losing this if this project can really be achieved.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

mkrnic

Quote from: jamespetts on November 25, 2020, 02:14:49 PM
Having an option for a single thread can be very useful when debugging threading specific problems, although providing that the multi-threaded simulation code (as opposed to graphics code) can be compiled as single threaded to diagnose losses of synchronisation with the server caused by multi-threading, it may be worth losing this if this project can really be achieved.

Ok, maybe the semantics for single threaded can change a bit into single threaded within any of the main two threads (render and game), i.e, those threads won't spawn any new threads.

jamespetts

Quote from: Mak on November 25, 2020, 02:25:33 PM
Ok, maybe the semantics for single threaded can change a bit into single threaded within any of the main two threads (render and game), i.e, those threads won't spawn any new threads.

Yes, that could work. Initially, I think, the retention of single threading was intended to support platforms where multi-threaded builds could not be complied or run, but all significant platforms now support multi-threading, so I do not think that there is any longer any fundamental requirement to retain fully single threaded operation.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

ceeac

The two-threaded approach is inflexible; IMO a better approach would be to think in terms of "tasks" (an abstract work item to be processed) and "task graphs". Any thread can process any queued task. For example, if the render task depends on the update task, it does not matter if you have 1 or multiple threads, the tasks are always processed in the correct order. This can also be applied to sub-tasks that are spawned as the update and render tasks are processed.

mkrnic

Quote from: ceeac on November 25, 2020, 05:13:44 PM
The two-threaded approach is inflexible; IMO a better approach would be to think in terms of "tasks" (an abstract work item to be processed) and "task graphs". Any thread can process any queued task. For example, if the render task depends on the update task, it does not matter if you have 1 or multiple threads, the tasks are always processed in the correct order. This can also be applied to sub-tasks that are spawned as the update and render tasks are processed.

I agree that tasks would be the way to go, however, already now it seems like this will require rewrite of a substantial part of code - I'm thinking of implementing HW accelerated rendering in parallel since that way I can discard most of the old (good or bad) rendering logic - and now if we add tasks and scheduler I think it would really become too big to handle all at once.

After that first part is done, on the other hand, precise refactoring could be done at multiple places.
Even then, I still think we should leave the control threads which would spawn/schedule all others.

jamespetts

Although this does not apply to the graphical code, beware that any simulation code multi-threading needs to be completely deterministic, or else network games will not be able to stay in synchronisation. This is relevant to any more general overhaul of the threading architecture, and is the reason that the multi-threading of the simulation element is constrained and sometimes somewhat unusual in implementation.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

mkrnic

Quote from: jamespetts on November 25, 2020, 08:09:23 PM
Although this does not apply to the graphical code, beware that any simulation code multi-threading needs to be completely deterministic, or else network games will not be able to stay in synchronisation. This is relevant to any more general overhaul of the threading architecture, and is the reason that the multi-threading of the simulation element is constrained and sometimes somewhat unusual in implementation.

Sure. I'll keep that in mind.

prissi

Also the vehicle movement is connected to the game logic, since there are signals etc. Thus a deterministic threading without wasting most of performance on waiting is quite a challenge and probably rather require a total rewrite of Simutrans. But updating the screen without moving vehicles first is also useless in terms of display smoothness.

mkrnic

What I have in mind right now (disregarding multithreaded game loop) are two possibilities:

1:

  • Run game loop iteration
  • Copy what is currently in viewport to a buffer
  • Render the buffer and continue game loop

2. Have two world instances: one for rendering and one for game loop

  • Run game loop iteration
  • Calculate diff from the last iteration
  • Apply diff to rendering world instance
  • Render what is currently in buffer and continue game loop

I think the UX would be better with option 2, but that would mean that RAM consumption would probably be significantly higher. I'll have to investigate and test.

Quote from: prissi on November 26, 2020, 01:23:17 AMBut updating the screen without moving vehicles first is also useless in terms of display smoothness.

It isn't. You still need to be able to scroll the map and run the animations for other things, like pedestrians, smoke and water. These might not be the most important things, but they provide better user experience.

Additionally, once everything else is set up, it starts to make sense looking into how to optimize/parallelize game loop calculations.

jamespetts

Thank you for your thoughts on this.

Simutrans-Extended's RAM consumption is already very high in large games, so I suggest avoiding anything that will significantly increase this further. One of the largest drivers of RAM consumption is the storage of data for individual tiles of ground, which I can imagine may have to be duplicated to an extent for the proposed model 2, although, given that my understanding of graphics coding is limited, I may be wrong about that last part.

As to the question of usability, I agree that, all other things being equal, it is better to have smoother scrolling and better UI responsiveness even if vehicle movement be no smoother. However, the most significant gain from using hardware acceleration for the graphics, if implemented well, will be that the CPU will no longer be as burdened with graphical work and will thus have more time to dedicate to simulation code. Performance profiling suggests that graphical code takes a significant proportion of CPU time at present. Note that this is already multi-threaded.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

prissi

smoke and pedestrians move like vehicles. Also it would be strange if vehicles move but vehicle stutter. Also the objects have to move everywhere, not just in the view. (Of course, Simutrans just renders the view.)

Also Simutrans has at most 128 microsteps per tile. That inherently limits the smoothness, and the curve code can old do jumps. I think you need to start from scratch (not least due to lack of models), with borrowing few code items. And you may end with something like machinsky in best case,

Anyway, I tend to be overly pessimistic, but I sincerely hope you succeed.

For large games, the bottleneck is the memory bandwidth, as large amount of tiles (and the things on them) have to be moved in and out of the divers cache levels. (This is why the 32 bit version is 20% faster of the same map.) For the same reasons many list were cjhhanged to vectors, to have them together in memory.

The graphics are only demanding when zooming out a small tile set like pak64, so that a lot of tiles are visible.

jamespetts

Quote from: prissi on November 26, 2020, 02:07:24 PM
The graphics are only demanding when zooming out a small tile set like pak64, so that a lot of tiles are visible.

On larger monitors, zooming out even in Pak128.Britain-Ex causes a noticeable reduction in performance, so offloading this to a GPU will have at least some noticeable beneficial effect if it can be done effectively.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

TurfIt

I have to second the overly pessimistic, but hope it works statement.

I really don't see where a GPU has much room to help the zoomed out performance issue. A quick and dirty test of simply disabling the image drawing routines and comparing with the non-disabled shows that at default zoom, drawing images takes 76% of the frame time, but at full zoom out that drops to 28%. This is running 4 threads with a maximized 4K window, pak64, and an newly generated 1024x1024 map (so the map is still on screen when zoomed out). Throwing more CPU threads at it (16) just makes the numbers worse for the GPU - 70% -> 21%.

Hence, for a GPU to help, much more than just the rendering needs to be offloaded. For reference, at default zoom, 12000 images are processed per frame. At full zoom out, 880000!

Antarctica

I don't understand much about 2D rendering, but the guys who developed Factorio got it right I think. IIRC they only use 2D models ("sprites") and they can render quite a bunch of those without delaying simulation too much. They wrote about some of that in their "Factorio Friday Facts" blog articles, but I didn't understand everything. But when I played factorio for the first time in August, I thought about how bad other 2D games actually perform - besides Simutrans also Age Of Empires 2 especially, where 300 archers firing already make the game nearly unplayable.

mkrnic

Quote from: Antarctica on November 28, 2020, 10:37:49 PM
I don't understand much about 2D rendering, but the guys who developed Factorio got it right I think. IIRC they only use 2D models ("sprites") and they can render quite a bunch of those without delaying simulation too much. They wrote about some of that in their "Factorio Friday Facts" blog articles, but I didn't understand everything. But when I played factorio for the first time in August, I thought about how bad other 2D games actually perform - besides Simutrans also Age Of Empires 2 especially, where 300 archers firing already make the game nearly unplayable.

Hi, and thank you for the insight. Unfortunately, I haven't yet played Factorio, but I'll be sure to check the blog. There are 363 articles as of now, though, so It'll definitely take me some time to find the ones you are talking about (and I'll probably get stuck reading others :)).

Right now, though, I'm focusing more on make anything visible with vulkan than optimizing.

prissi

The solution, implemented in the old Simutrans3D brach was to have all images in a big texture, to avoid loading images from and to the graphics board, which would take more time than software rendiering. (Since the PCI bus is slower than the memorz access.)

mkrnic

While not much, there is some progress:



I've added new sdl2 windowing, and updated how events are processed and passed from window manager to simutrans. Of course, I've also added the new Vulkan renderer. As a result, the code probably won't even compile on anything except linux right now (maybe even anything except my pc).

Currently the code is both ugly and not optimized either for speed or for memory usage.

The whole ground is oscassionally (on window resize) recalculated and transferred to the graphics card. I'm testing on 1024x1024 and it's using around 150MB VRAM. After optimization I'm guessing the same screen will use around 1/4 of that since right now everything is multiplied a few times.

I'm planning on using that logic for showing grid on the ground, and vertices can be reused for drawing tiles. so only indices would need to be transfered.

Quote from: prissi on November 29, 2020, 12:02:15 PM
The solution, implemented in the old Simutrans3D brach was to have all images in a big texture, to avoid loading images from and to the graphics board, which would take more time than software rendiering. (Since the PCI bus is slower than the memorz access.)
I was planning also on implementhing something like that.