OpenGL?

jamespetts · November 14, 2016, 12:10:37 AM

A query to the Standard developers about OpenGL, if I may - does this actually offload any non-trivial amount of graphics work to the GPU from the CPU in its current implementation in Simutrans?

I recall reading that it does not improve performance in Standard (and indeed, has lower performance), but I do not know whether this is because the time to draw each frame is higher even though it is offloaded into the GPU, or whether it is because it does not actually offload anything onto the GPU at all.

If it is the former, and the actual graphics performance is still acceptably responsive, then it might be more useful in Experimental than Standard as it would free up the CPU to do more simulation calculations. Currently, the graphics can take up 25% or so of Simutrans-Experimental's CPU time, which, now that a substantial amount of simulation code is multi-threaded, makes a real difference to how much is left for the simulation code. If a significant amount of this work could be offloaded to the GPU, this could improve performance further. If, on the other hand, the OpenGL backend does not in fact offload anything significant to the GPU, then this would not be worthwhile.

I should be grateful to know what the position is so that I can know whether to look into compiling with the OpenGL backend.

DrSuperGood · November 14, 2016, 01:18:49 AM

Quote
A query to the Standard developers about OpenGL, if I may - does this actually offload any non-trivial amount of graphics work to the GPU from the CPU in its current implementation in Simutrans?

Does it even work? As far as I am aware the OpenGL branch has no maintenance done to it as it was a dead end with the way graphics are currently implemented.

Or are you talking about the OpenGL front? As far as I am aware that is nothing more than an alternative like SDL, SDL2 and GDI for displaying the software composed graphics. I think it was needed in the past for Mac systems which lacked SDL/GDI support and SDL2 was not available.

Yes a significant amount of work could be offloaded to the GPU to perform, however the problem has always been that sending the work to the GPU would result in more work for the CPU than composing the graphics is. The way graphics are currently drawn involves lots of very small images being drawn individually. The CPU can do this very efficiently as the images are already in memory, however for the GPU to do this each image requires multiple expensive driver calls to setup and draw which ends up costing more than the CPU drawing the images. Especially with 16bit colour from indexed colours with special values and other non-standard logic it quickly becomes far more efficient for the CPU to do it rather than the CPU telling the GPU to do it.

For GPU offload via OpenGL (or Vulcan) to be viable the way graphics are dealt with would need to be changed. Best results would probably require that graphics be completely separate from game state so that one or more threads can process graphics independent of the main thread and that all graphics can be grouped together in a way that allows the GPU to be efficiently instructed. All sprites would need to be loaded into "sprite sheets" as required and the CPU would order the rendering of every visible sprite from the sheet at the same time. At this point in time one might as well migrate to full 32bit colour in the sRGB space as the GPU will likely perform the same. It is worth noting that the way alpha is currently implemented in standard I do not think is colour correct as correction would be a big overhead, however in OpenGL it would be correct as long as both sprite sheets and output buffers are specified to be in the sRGB colour space.

jamespetts · November 14, 2016, 09:55:36 AM

Thank you: that is most helpful. I think that I was confusing the backend and the frontend. It seems clear that, at present, there is likely to be no advantage to using the OpenGL frontend for graphics.

isidoro · November 14, 2016, 10:16:15 PM

Another option is to offload some calculations to the GPU (compute shaders). In a massive parallel simulation, it can be a real advantage. But, of course, the price is that the game could only be played in some more modern hardware.

DrSuperGood · November 14, 2016, 10:52:46 PM

Quote
Another option is to offload some calculations to the GPU (compute shaders). In a massive parallel simulation, it can be a real advantage. But, of course, the price is that the game could only be played in some more modern hardware.

As far as I am aware Simutrans does not do the sort of calculation that can take advantage of that.

Vladki · November 14, 2016, 11:04:37 PM

Just a wild brainstorming idea. Would it be beneficial to implement some more complex algorithms to run on GPU instead of CPU? They have very different instruction set which may be more suitable for some tasks like pathfinding, distance or travel time calculations? GPU can be used for mining bitcoins, cracking ciphers, etc. However I do not know much about the GPU instruction sets so it is very likely that they will not give any advantage.

jamespetts · November 15, 2016, 12:25:49 AM

I have never done any GPU programming. My (very vague) understanding is that GPUs are more efficient at doing a very large number of repetitive very simple operations, whereas CPUs are better at more heterogeneous and complex operations. If this understanding is even approximately right, I do not think that GPU offloading will assist the simulation code in any useful way, although somebody who has a better than vague idea of these things may well correct me.

isidoro · November 15, 2016, 11:53:27 PM

In fact, GPU are massively parallel. That means that we can have hundreds of cores. You can get performance out of that depending on the problem at hand. Many unrelated similar calculations with different data, or "local problems" in which the outcome of a calculation depends only of some variables in the near surrounding of a given one, etc. are better.

You have to also consider the times involved in uploading/downloading data to/from GPU. Ideally a small result coming from large amount of data that can stay in GPU memory, would also be a plus.

If finding good use for 4 or 6 CPU cores is difficult, imagine what you have to do with hundreds of them!

jamespetts · November 16, 2016, 12:40:16 AM

Quote from: isidoro on November 15, 2016, 11:53:27 PM
If finding good use for 4 or 6 CPU cores is difficult, imagine what you have to do with hundreds of them!

There is no trouble finding plenty of use for 4 or 6 cores: in Experimental, the private car route finding, convoy/consist route finding, passenger and mail generation and unreserving of block reservations can all take an arbitrary number of cores and will continue to be faster at those tasks until the number of cores exceeds the number of vehicles needing to find routes in any step, or the number of times that the passenger generation algorithm needs to run in a step (this can be hundreds), or the number of towns in a game (which can also be hundreds).

DrSuperGood · November 16, 2016, 02:16:47 AM

To briefly explain why OpenCL is not useful, it is because GPUs are very bad at anything involving "if" or "while" statements. Such statements are extremely common in the Simutrans code.

isidoro · November 16, 2016, 10:21:30 PM

Many algorithms can be rewritten to avoid those code branches. I wouldn't say GPUs are specially bad dealing with loops. See for example:
http://supercomputingblog.com/cuda/search-algorithm-with-cuda/

Googling a little, it's easy to find articles about using the GPU to accelerate A* or Djisktra algorithm, which would certainly be of use in ST. Bigger problems are compatibility issues and losing a great deal of (not so) old hardware.

http://hgpu.org/?p=13201

News:

OpenGL?