Hardware accelerated display, OpenGL back-end & Simutrans 3D

Markohs · January 28, 2012, 04:03:17 PM

Hi Ters, thank you for taking the time and your help.

Applied the patch you sent me.

Well, all files in the sim3d folder need to be excluded from the project, they are not used atm, you only need to compile simsys_ogre.cc and simgraphogre.cc, replacing simsys_s.cc and simgraph_s.cc.

The sim3d/ files are the previous code I made when I focused the game to be full 3d, and will be useful in the future, atm I'm just focused on rendering current simutrans2D, using the ogre framework.

About the crash:

In my computer dr_query_screen_resolution() isn't called at all, the CPU reaches there having already values for disp_heigh and disp_heigh. But it's suposed to return the max allowed win dimensions, the correct way of handling it whould be parse:

static Ogre::StringVector foundRenderDevices;
static Ogre::StringVector foundResolutions;

That should be already been populated by the init code in "dr_os_init", but I didn't took the time to parse it yet, you can just return 640x800 or something manually till I write the code to parse it, or you can do yourself.

This code should fix the problem.

Code Select


resolution dr_query_screen_resolution()
{
   resolution res;

   res.w=800;
   res.h=600;

   return res;
}

Ters · January 28, 2012, 04:32:14 PM

At least some of the header files in sim3d are included by files elsewhere, so they either need to be unincluded or fixed.

Once I changed dr_query_screen_resolution(), the game stops crashing when using Direct3D. No map though, but most of the GUI is there in right colors and proportions. Active toolbar buttons are missing, though.

When using OpenGL, it appears this problem occurs: http://www.ogre3d.org/forums/viewtopic.php?f=2&t=8699

Markohs · January 28, 2012, 04:49:02 PM

Just made a commit including your fixes, and some new code. Can you compile again now please?

Also, you'll need a new media folder:

http://dl.dropbox.com/u/30024783/media.zip

BTW, wich version of the SDK are you using? I use 1.7.4, also have in mind you'll have to move the DLL's to the directory of simutrans, it just needs:

OgreMain
Rendersystem_GL
RenderSystem_Direct3D9
OIS

I think cg.dll it's not necessary, but I have it there too.

On ogre.log you can see hints of error, but well, I see you already know all of this.

About the GL error:

Moving initializeallgroups call to after creating the renderwindow it's I think doable, I'll have it a look.

Ters · January 28, 2012, 06:44:57 PM

I use version 1.7.2 of the OGRE SDK, since that's the last pre-built version for mingw.

Apart from the head having changed to a building, the Direct3D version is unchanged. The OpenGL version now gets far enough to render the building, but nothing else, then crashes inside my graphic card's OpenGL driver. Last line in the log says it's loading or have loaded entis.png. I think there is reason to doubt that this is a fault in simutrans 3d.

Markohs · January 28, 2012, 07:05:59 PM

Might be related to the initializeallgroups directive, maybe, at least on the OpenGL part. let's see....

simsys_ogre.cc, line 464:

Code Select


   Ogre::ResourceGroupManager::getSingleton().initialiseAllResourceGroups();

move it to

Code Select


int dr_os_open(int w, int const h, int const fullscreen)
{

   renderWindow = root->createRenderWindow("Simutrans " VERSION_NUMBER,w,h,fullscreen);

   if (renderWindow==NULL){
      fprintf(stderr, "Couldn't open the window.\n");
      return 0;
   }

   DBG_MESSAGE("dr_os_open(Ogre)", "Ogre realized screen size width=%d, height=%d depth=%d (requested w=%d, h=%d  depth=%d)", renderWindow->getWidth(), renderWindow->getHeight(),renderWindow->getColourDepth(), w, h,COLOUR_DEPTH);

   display_set_actual_width( w );

   Ogre::ResourceGroupManager::getSingleton().initialiseAllResourceGroups();

   dr_fb_setup();

   wl=new windowListener();
   root->addFrameListener(wl);
   rtt_render_texture->addListener(wl);

   return w;
}

About the screen not rendering good problem, I'll try to figure what's going wrong, some questions:

- Are you running in windowed or fullscreen mode?
- Wich is your video card?
- Can yoou post a screenshot of the failed render?
- Does the rendered building rotate or stays quiet?

Sounds like it's not a non-pow2 issue since at elast some parts of the screen are rendered okay, I'll get more computers to test what's wroing wrong, all looks good here. :(

Thank you for your help, Ters

Ters · January 28, 2012, 08:45:29 PM

Quote from: Markohs on January 28, 2012, 07:05:59 PM
About the screen not rendering good problem, I'll try to figure what's going wrong, some questions:

- Are you running in windowed or fullscreen mode?
- Wich is your video card?
- Can yoou post a screenshot of the failed render?
- Does the rendered building rotate or stays quiet?

Are you asking about the Direct3D or OpenGL results? In either case, I run the game windowed and on an ATI Mobility Radeon HD 5850. With Direct3D, the game was running, but with OpenGL, it crashed and I could only see something because either the debugger or the Windows error reporting tool kept the program in limbo.

I suggest not looking into the OpenGL problem unless other people still report problems after the resource initialization call has been moved. It might just be something with my computer, or the way I built the executable.

Markohs · January 28, 2012, 09:14:28 PM

Okay, as a sidenote:

- Found that HardwarePixelBuffer:blt DOESN'T blend, just copies squares, overwriting and not blending using alpha, so my approach is not valid. This approach made possible to draw current simutrans in a hardware-accelerated fashion easy.
- Given that, I considered using 2 hardwarepixelbuffers, and blend them together on the render one for odd and other for even square positions, but I think this solution won't work because graphics have x and y offsets, and the z-order wont be feasible.
- The new approach is using yet another RTT layer and render the quads in a 2d fashion, like this.

I'll keep you updated as usual for the case anybody is interested, or has a new idea.

EDIT: I think I'm on the right track now: using a RTT texture, setting his camera viewport no not update every frame, and each frame, just render the queued items on top of the previous frame. This should support alphablending and image scaling without problems.

Looks like all 2D drawing functions were banned from DirectX9+ from what I read. Using a 3D library layer looks like the only way to accelerate games nowadays.

Markohs · January 30, 2012, 02:03:32 AM

My preliminary tests are being quite sucessful, this method (rendering images as quads using alphablending textures using the 3D renderer) looks very promising. On some tests I've been making I've averaged 207 fps rendering 100 items each frame, plus a standard render of one 3D model at the same time. I think that's enough for simutrans. Buffer is not cleared each frame, so it works in the current framebuffer fashion, only writing changed sprites.

If this algorithm is faster or not than current simgraph routines, remains to be seen.

Thinking about this, at the end of this I think I can give new hardware accelerated features to current 2D simutrans.

- Arbitrary map zoom, applied by hardware.
- Partial transparencies, this can be applied to underground modes. Current pak graphics don't have a alphachannel, but this can open the possibility of 32 bit (R8G8B8A8) images instead of the current 16-bit (RGB with special colours) ones .

prissi · January 30, 2012, 10:19:29 AM

The ground could be transparent also in curretn simutrans, as we could draw transparent tiles. However, I am not sure if this would make access much easier; I rather doubt it.

But I am curiuos how the HW-accellerated simturans would compare. (Actually, the BLTs are currently most likely HW accellerated too).

Markohs · January 30, 2012, 10:52:02 AM

Yea, it's not clear how we could apply those alpha partial renderings in the actual 2D, and they are already possible in current code. I don't really know if they will be useful or not (maybe to GUI items). But blending and color interpolating are computed by our rutines in pix_blendxx_xx manually, new routines for blending use the 3D engine to blend them, since a graphic card is a vectorial CPU it can compute complete lines in just one "cycle". This textures will be allocated to the VRAM of the memory card, with the speed boost that can mean.

This approach also eliminates the need of clipping the borders of the screen, and makes possible other clipping using texture offsets, even through don't think this will transform in way better performance since the clipping code I guess doesn't consume much CPU.

Another example of possible performance gain is the "display_img_nc", atm it's optimized I think to the full extent what it can be optimized, including that very clever 32-bit copying+1 16-bit copy.

The problem in that routine is that's not vectorizable for just one reason: the rle encoding, each line has a varying size, so one line can't rastered in a vectorial fashion. The only way to accelerate this is uncompressing the image in memory at the expense of maybe higher memory footprint, and make the alpha pixels explicit. That makes rendering the bitmap by hardware possible.

I'll just have to keep one uncompressed version in memory, since the zoomed versions are not necessary any longer, and player coloured versions cwill be created on the fly too, storing them all, not just one like now.

At least, that's my current way of implementing this, all can vary since I'm doing this in a trial-and-error way.

Fabio · January 30, 2012, 12:18:32 PM

Enhancing 2D display and enabling full png alpha transparency look like two awesome possibilities!

Markohs · January 30, 2012, 02:04:30 PM

Quote from: fabio on January 30, 2012, 12:18:32 PM
Enhancing 2D display and enabling full png alpha transparency look like two awesome possibilities!

Yea, I think so, but don't really know wich can the possibilities be, beside to give you artists the chance to apply alpha to your creations freely.

I was thinking underground modes and maybe add some transparency to buildings when a vehicle passes after them, or making high city buildings partially transparent, not like they are now, grayed down.

Markohs · January 31, 2012, 01:59:56 AM

This starts looking better, but still lots of work to do, tomorrow more.

Markohs · January 31, 2012, 04:47:30 PM

Advanced more, ~~the CPU usage is high, close to 100% of one core~~, but I think I can assign a second thread to the 3D rendering of the frame and leave the other to the current singlethreaded simutrans.

EDIT: No, I just noticed I was running it in debugging mode and -log 1 -debug 4, without those options it actually consumes less CPU than 2D simutrans, but I lack rendered objects still.

The image looks way smoother, result of the image blanding and interpolating, we also have a 0.5 pixel error now that might be fixed easy.

Look at the trees shadows, all looks better, but I don't know if this blurring will look good at the end.

Spike · January 31, 2012, 09:53:42 PM

I like the shadows, they help to give an impression of the contours and IMHO "glue" together the things.

Markohs · February 01, 2012, 08:53:04 PM

Still on this, progressing, all is working good so far. Just one question.

Who coded makeobj? Do you think it can be modified to create 32 bit pixels .pak too easy (8bit alpha, 8 red 8 blue 8 green)? Since source images are .png I guess this can be done without much problems?

kierongreen · February 01, 2012, 10:17:26 PM

It was coded along with the rest of simutrans. I'm not sure how you'd extend paks to have 32bit images - you'd have to choose a new format for the image data (could well leave it as png). You probably won't notice the difference in colour depth in game, alpha more than 4 bits won't be clear either really. Could well use some of the alpha bits to indicate player or whether pixels are illuminated (giving you a wide range of lit colours).
EG
Bits 1-4 alpha
Bit 5,6,7: 0 normal, 1 primary player colour, 2 secondary player colour, 3 retain colour at night time, 4 show black during day (unlit windows). If any value other than 0 then the alpha bits indicate the strength of effect, to allow antialiasing around player colour and window areas.

Still leaves some free values for bits 5,6 and 7, and bit 8 unused. Would require special tools to edit images properly though...

But maybe that is just pushing sprites too far and indicates 3d models would be more flexible (though you'd have to have a way of indicating player colour and lights on those too). Personally I'd go down that route rather than trying to overcomplicate 2d images which in long term you'd be trying to replace with models anyway.

Spike · February 01, 2012, 10:33:13 PM

Quote from: Markohs on February 01, 2012, 08:53:04 PM
Who coded makeobj? Do you think it can be modified to create 32 bit pixels .pak too easy (8bit alpha, 8 red 8 blue 8 green)? Since source images are .png I guess this can be done without much problems?

Volker Meyer made it. I had a rather monolithic file for all graphics before, and he invented the pak files to store object attributes and graphics together. The monolithic file didn't really allow mods and we were looking for a way to allow players to make their own addons.

It's not so hard to extend the image nodes to 32bit graphics data. But I'd suggest to create a different node type for those, so that you can use the same pak for traditional display with 16bpp images, and the 3D display with 32bpp images from one pak file.

Markohs · February 02, 2012, 01:10:50 AM

Thank you both.

Sometimes (specially today) I think forcing the sprite drawing algorithm in a 3D renderer it's pushing too far too, kierongreen.

Today I'm facing the problem that rendering the bitmaps one by one like I'm doing it now isn't fast enough, each call to display_img_aux, display_color_img... it's queued to be rendered and EACH requires one RenderOperation by itself. This pushes performance below current Simutrans 2D code. Scaling is done on-the-fly by the renderer, but the current code caches the zoomed images in memory, so it's only slower when you switch zoom level, so the performance gain on that by Simutrans3D vanishes.

But maybe I'm missing something, or disabled the dirty tile management somehow because in a 1280x800 screen like this:

I'm getting 954 sprites draw PER FRAME, don't really ounderstand where do they come from since the window is mostly static, no action around.

Or maybe I have done something wrong and I'm wasting CPU performance anywere else.

I can try to draw entire horizontal tile lines in just one render operation, but that has one restriction: One renderoperation can only use one texture, so it whould be only useful on flat terrains for example (you can batch all tiles to one renderop) or on seas. This batching whould require also the use of hashes or some kind of memory structure, it's a bit complicated and the performance gain whould only apply to some circunstances so it's not really a solution per se.

I'm also trying to make one renderop using multiple textures and render batched lines of the map, but I don't really know if this is even possible. If that's possible performance whould be better than Sim2D within a huge difference.

Well, anyway I'm happy because all of this is making me understand the inners of simutrans, I'll find one solution one way or another.

Or I'll just forget about the 2D and just write a pure 3D program, like many people advised me here in the past.

We'll see, errors are part of learning.

EDIT:

By the way, I've found rendering trees is being the biggest performance problem most of the times, and I'm pretty sure it pushes current simutrans badly too. But I guess that if the game needs to be able to render a full moving city, it should be abe to render a forest too.

Maybe grouping the renderops by texture, and playing with the z-order... Actually that might work, I'll try tomorrow, then the algorithm whoudn't have a O(X*Y*K) cost, it whoukld just be O(num_of_different_textures_in_viewport).

Or maybe not, because that whould render items in the incorrect order...

kierongreen · February 02, 2012, 08:44:42 AM

If you can store all objects on the screen as 2d textured surfaces within a 3d world, then render that in one operation you should be ok. You'll have to have a table of links from objects to the surfaces though so that vehicles moving will update positions of the display surfaces. Surfaces can then be substituted for models in time, and in theory you get a slow change from 2d to 3d simutrans which remains playable throughout the transitio.....

Then viewpoints displayed within windows would have to be created as new rendered worlds with their own object->surface links. Alternatively (although I suspect memory constraints would not allow this) the entire map could be stored and updated every frame as a 3d word.

If you are having to render each one manually every frame performance, as you can see will suffer....

Don't knock what you have achieved so far though!!

Markohs · February 02, 2012, 10:02:14 AM

thx for the reply kierongreen.

Actually that whoudn't work either. everytime I plant one 2d textured image in the map, that means one render operation just for that item.

Rendering in 3D is quite similar to doing it in 2D, it basically draws all the visible items one by one like painter's algorithm way:

http://en.wikipedia.org/wiki/Painter%27s_algorithm

But hardware uses extra framebuffers and has one framebuffer dedicated to the Z-koord os the last opaque item in the viewport, and just draws items that will be shown.

The problem is that for every item in the scene it has to perform a "RenderOperation", it's just one draw of one geomery that's accumulated on the color buffer (the actual output data). The sum of all those renderops, will create just one frame.

The idea is grouping some of the viewable items together in a single geometry (it hasn't to be continuous in space, just composed of triangles), and apply one texture to them, offsetting the texture in the vertexes as you need. So just on one renderoperation, you can draw lots of sprites on screen just in one pass. The problem in Simutrans2D is that given the inherent isometric view, I have to draw the image from top to bottom, in rows, so items in the back are not drawn AFTER that items in the front.

Looks like the solution to this is using texture atlas:

http://www.gamerendering.com/2009/12/08/texture-atlas/
http://http.download.nvidia.com/developer/presentations/2005/GDC/Direct3D_Day/D3DTutorial03_Optimization.pdf
http://http.download.nvidia.com/developer/NVTextureSuite/Atlas_Tools/Texture_Atlas_Whitepaper.pdf

Maybe I can apply this, have to think about this a bit more. The actual code works, it's just too slow.

prissi · February 02, 2012, 10:17:02 AM

May<be tree are changing (depends on seasons) and if you use the new clipping algorithm (which will not be useful for your approach, tiles are draw many times when a pedestrian is on them. Set
simple_drawing_tile_size = 255
or so will activate the old behaviour with a single pass.

However, if the OpenGL renderer is slower than the normal CPU driven one, than this would not surprise me, as the speed is limited by the main memory access (or are the sprites loaded to the HW buffer on the card?) At least all attempts for OpenTTD was resulting in this. THus maybe rather go 3D.

Dwachs · February 02, 2012, 10:30:17 AM

Quote from: Markohs on February 02, 2012, 01:10:50 AM
But maybe I'm missing something, or disabled the dirty tile management somehow because in a 1280x800 screen like this:

I do not know up to which point you merged trunk, but there was a bug introduced in r4901 and fixed in r5118 that prevented resetting the dirty flag.

Markohs · February 02, 2012, 10:40:54 AM

The bottleneck is not the memory access I thik, since all sprites are loaded into the video RAM, the problem is the fact that the number of image drawing operations is very high, since I don't have a way to group them, losing all the advantage of a GPU. More time is spent switching to one sprite to the other, and rendering than the actual time a render takes.

That nVidia document menctions that rendering more than 1000 batches per frame it's too much. We should aim to 1000, only.

An on normal zoom image I'm getting 954 sprites, already.

The algorithm is basically:

1) load all needed textures to VRAM (this is not speedlight fast, but it only has to be done at the estart, not at each frame)
2) from simintr.cc, intr_refresh_display:
- tons of sprites are queued to be drawn, this has not performance impact
- At the end of the frame, all sprites are rendered:
- Creates a HardwareVertexBuffer (no cost)
- Fills the HardwareVertexBuffer with the geometries (2 triangles for each sprite, plus the offsets of the texture to apply)
- foreach(sprite)
* Create a renderop
* set texture to the renderop
* Set the geometry start pointer in the vertexbuffer to the renderop, and it's size (allways 7 vertexes)
* Tell GPU to render the renderop on top of the framebuffer

The time is spent in the '*' lines. If we can group the operations somehow, the '*' part of the algorithm will be less performance punisher. But for this we'll need a texture atlas, to be able to not switch the texture *SO* often, and play with the offsets.

To make it short: A GPU likes you to fetch some small number of batches (1000 at most) with lots of geometries and texture offsets (GPU will draw it super fast), than lots of batches with a simple geometry (atm, just 2 triangles forming a square, more time is wasted starting renderops and switching to the next than teh actual render)

EDIT: Thx for your comments, I'll have it a look to the dirty setting you menction, prissi/dwachs.

BTW, this also helped me to get lots of new ideas for the 3D tilemanager I need to implement anyway for rendering the map in 3D, learning new things is never a waste!

Markohs · February 02, 2012, 11:40:29 AM

Quote from: Dwachs on February 02, 2012, 10:30:17 AM
I do not know up to which point you merged trunk, but there was a bug introduced in r4901 and fixed in r5118 that prevented resetting the dirty flag.

I have merged 5129 in my branch, anyway I'll echeck 5118 to see where was the bug and if I still have it. Thx!

Markohs · February 02, 2012, 12:03:29 PM

Given that the screen drawing it's done in 3 important steps:

- Ground
- Things
- Overlays

If I could manage to get a atlas image of all the sprites used in each phase, this whould work.

But I don't know if all those sprites can fit in an atlas of let's say 8192x8192 pixels. Ground ones will fit for sure, but for things this can be maybe impossible.

That whould need generating atlas just for Simutrans3D, and render the paks unusable, or maybe I can create the atlas dynamically in memory as the game demands new images.

I'm not really sure if this work is really worth it, but I'm pretty sure it whould work good. I fancied making a 3D renderer for current simutrans2D and releasing it before starting with the pure 3D simutrans.

I think I'll give the dynamic atlas creation a try, have a boodbeat about it.

TurfIt · February 02, 2012, 02:46:32 PM

Are there limits to the texture size?
i.e. Can you load all the sprite images into a single texture allowing a single renderop for the 'Things'?

Also, how would going full 3D help this situation? Instead of rendering 954 objects with 2 triangles each (billboard), you'd be rendering 954 objects with say 100 triangles each. Since number of triangles isn't the bottleneck (within reason), I don't see the difference...

Markohs · February 02, 2012, 03:23:49 PM

Just checked the revision you pointed me, Dwachs, and your patch was applied already to my code, thx.

From what I saw, in the current simutrans standard code, simgraph16.cc, line 2231:

void display_img_aux(const unsigned n, KOORD_VAL xp, KOORD_VAL yp, const sint8 use_player, const int /*daynight*/, const int dirty)

But the image is drawn anyway, the "dirty" bool just makes sure dr_textur is called to tell the low level routines to make sure it's copied to screen.

Then... Why is it drawn at all if dirty=false? isn't this wasting CPU? I think I misunderstood the concept of dirty tiles applied to my renderer.

I just made a if(drty)return; and the sprite number dropped a lot:

16:17:00: Ogre2D has 32 sprites
16:17:00: Ogre2D has 32 sprites
16:17:00: Ogre2D has 32 sprites
16:17:00: Ogre2D has 146 sprites
16:17:00: Ogre2D has 32 sprites
16:17:00: Ogre2D has 32 sprites
16:17:00: Ogre2D has 32 sprites
16:17:00: Ogre2D has 32 sprites
16:17:00: Ogre2D has 32 sprites
16:17:01: Ogre2D has 146 sprites
16:17:01: Ogre2D has 144 sprites

But I'm getting tiles not redrawn when pedestrians and city traffic passes over them, and on a zoom:

Markohs · February 02, 2012, 03:32:27 PM

Turfit, the same situation arises in 3D, yes.

You can't just put one 3D mesh for each item in the map, you have for example applied to a terrain, construct one SINGLE mesh and give it the form it has to have, and in our case, simulate the "tiles" applying a multilayered texture over it and tweak the blending values in the needed positions.

Same applies to for example roads they have to be "fusionated" and sent to the render as a single mesh.

Since this approach is hard to do, it's splitted in big chunks of vertexes, for example every 512 vertexes, you create a new mesh. Uses to be powers of 2 for many reasons (most of them I don't know yet), but basically because b-trees are used to generate them in optional computing time.

That's why for eaxmple buildings will have to be modeled in a single mesh. and most objects have to be grouped somehow. The more batches you render each frame, the poorest performance you'll get, since most of the work will be done by the CPU instead of the GPU.

Think like this:

It's an order of magnitude faster rendering 10 batches of 1000 triangles than 100 batches of 100 triangles.

This will have lots of implications in how we'll manage the objects in simutrans. For example cities will be modeled as a single mesh or in a low number of sub meshes, and the mesh will have to be recreated if any of the components has changed. But you can render it super-fast, in any angle, in almost constant time.

Same will happen to forests, for example. Or we can just group items by 3 categories:

1) Almost immutable geometries.
2) Geometries that have to change often.
3) Movable objects, like vehicles.

TurfIt · February 02, 2012, 03:53:23 PM

Quote from: Markohs on February 02, 2012, 03:23:49 PM
From what I saw, in the current simutrans standard code, simgraph16.cc, line 2231:
void display_img_aux(const unsigned n, KOORD_VAL xp, KOORD_VAL yp, const sint8 use_player, const int /*daynight*/, const int dirty)
But the image is drawn anyway, the "dirty" bool just makes sure dr_textur is called to tell the low level routines to make sure it's copied to screen.
Then... Why is it drawn at all if dirty=false? isn't this wasting CPU? I think I misunderstood the concept of dirty tiles applied to my renderer.

Dirty tiles is a higher level concept. Once down to the actual display routine, it's being called because the image needs to be drawn...
If the tile is not marked dirty, the objects on it aren't processed.

Quote from: Markohs on February 02, 2012, 03:32:27 PM
Turfit, the same situation arises in 3D, yes.

<...>

That's why for eaxmple buildings will have to be modeled in a single mesh. and most objects have to be grouped somehow. The more batches you render each frame, the poorest performance you'll get, since most of the work will be done by the CPU instead of the GPU.

I think that's what I was trying to ask above. Put all the images into one texture, and merge all the 2 triangle billboards into one render call. I take it Ogre doesn't allow you to render multiple meshs with one call? Is there any hardware support for merging meshs?

Markohs · February 02, 2012, 04:09:32 PM

Quote from: TurfIt on February 02, 2012, 03:53:23 PM
Dirty tiles is a higher level concept. Once down to the actual display routine, it's being called because the image needs to be drawn...
If the tile is not marked dirty, the objects on it aren't processed.

But I'm getting almost 1000 display_img_aux each fram on a non unzoomed position, is that normal? I might have screwed something up, because I'm getting also heap corruption. So I've commmented out code that messes with pointers I shoudn't have commented out. I'll check it.

I'm also checking the option prissi pointed out.

Quote from: TurfIt on February 02, 2012, 03:53:23 PM
I think that's what I was trying to ask above. Put all the images into one texture, and merge all the 2 triangle billboards into one render call. I take it Ogre doesn't allow you to render multiple meshs with one call? Is there any hardware support for merging meshs?

Yeah, that's the "atlas" concept. You create a single texture with all the images you might want to show, like this:

Then you render the whole screen in just one render operation, sending the geometry. That whould work for sure. And it whould be light-speed fast.

The problem is I think I can't put all images on a texture, since pak 128 for example has like 2.000 images if I recall correctly. I think there is a limit to a texture size, that can vary from driver to drive, maybe with a minimum of 1024x1024. I don't think they can fit on the same image, maybe splitting in multiple atlas can work. One for backgrounds, one for buildings, another for movable objects...

Or create the atlas in-memory on demand, but it's a complicated algorithm if you wanna take max profit of the texture surface, since images have different size, and coputimg optimal distribution it's an exponential problem, even you knowing all the sices on startup. A simple algorithm whould render much unused space on the texture.

So I can render all in 3-6 render passes. Have in mind every render writes on TOP of the previous renders, so this has to be done in the correct order.

EDIT: Info about texture sizes in vendors:

[link]http://vterrain.org/LargeTextures/[/link]

TurfIt · February 02, 2012, 05:13:46 PM

Quote from: Markohs on February 02, 2012, 04:09:32 PM
But I'm getting almost 1000 display_img_aux each fram on a non unzoomed position, is that normal? I might have screwed something up, because I'm getting also heap corruption. So I've commmented out code that messes with pointers I shoudn't have commented out. I'll check it.

That does appear excessive for the screen shots you've shown. I'd have to wade through your 3D branch code to find where all you've hooked into the display routines, but it sounds as though you've done so at quite a low level. The entire current structure of the code is oriented towards 2D and its requirements. To optimally use 3D (or even this hybrid billboard approach) you'll need to change away from this 2D code completely.

Quote from: Markohs on February 02, 2012, 04:09:32 PM
I'm also checking the option prissi pointed out.

prissi was referring to an option that controls the 2D clipping. That software 2D clipping shouldn't be necessary for 3D rendering? ??

Quote from: Markohs on February 02, 2012, 04:09:32 PM
So I can render all in 3-6 render passes. Have in mind every render writes on TOP of the previous renders, so this has to be done in the correct order.

I'm getting lost here, need to see the code to comment much further I thinks...

Isn't the whole idea to populate a scene with several objects, all with their own meshes/textures, located at 3D coordinates, and render once? (excluding the background landscape and foreground overlay) The render operation should take care of the correct order in hardware.

If you're calling a render object by object, back to front as the current 2D code does, then I think you've created a 3D decelerator like was done for OpenTTD.

Markohs · February 02, 2012, 05:40:14 PM

hehe, I'll try to explain it clearly now:

I've just modified simsys and simgraph with code that calls to Ogre functions, the modifications have been:

- In Symsys, nothing really fancy, opening the window, handling of events, etc..
- In simgraph:

- 'textur' is gone, there is not a framebuffer accesible anymore.
- this functions:

display_img_aux
display_color_img
display_rezoomed_img_blend
display_base_img

And all that manipulated the "textur" pointer (font management, line rawing etc), have been commented out.

- in simintr.cc, intr_refresh_display I have hooked one funcion called renderframe() that forces a image rendering of screen.

At this state the program should show a black screen completely.

Okay, after that:

rezoom_img() has been modified to just assign the new x,y,w,h of the images, the image is not recomputed.

AND:

Code Select


/**
 * Zeichnet Bild mit verticalem clipping (schnell) und horizontalem (langsam)
 * Draws image considering vertical and horizontal clipping
 * @author prissi
 */
void display_img_aux(const unsigned n, KOORD_VAL xp, KOORD_VAL yp, const sint8 use_player, const int /*daynight*/, const int dirty)
{
    if (n < anz_images) {
        // need to go to nightmode and or rezoomed?

        KOORD_VAL h, clip_bottom, clip_top;

        if (use_player) {
            return;
/*            FIX: Handle colored data
            sp = images[n].player_data;
            if (sp == NULL) {
                printf("CImg %i failed!\n", n);
                return;
            }*/
        } else {
            
            // REZOOM AND RECODE ARE NOT NEEDED ANYMORE
            if (images[n].recode_flags&FLAG_REZOOM) {
                rezoom_img(n);
//                recode_normal_img(n);
            } 
/*            else if (images[n].recode_flags&FLAG_NORMAL_RECODE) {
                recode_normal_img(n);
            }
*/
            // SIM3D:
            // generate uncompressed version

            if (!images[n].data_pb){
                decode_img(n);
            }

        }

        if (!dirty){
            return;
        }

        // now, since zooming may have change this image
        yp += images[n].y;
        h = images[n].h; // may change due to vertical clipping

        // in the next line the vertical clipping will be handled
        // by that way the drawing routines must only take into account the horizontal clipping
        // this should be much faster in most cases
        // SIM3D: only discard items not visible at all

        // bottom of the screen

        // calculate how much height will fall outside and exit if nothing will be visible
        clip_bottom = yp + h - clip_rect.yy;

        if ((clip_bottom > 0) && (h-clip_bottom <= 0)) {
                // not visible at all
                return;
        }

        if (clip_bottom<0){
            clip_bottom=0;
        }

        // top of the screen

        clip_top = clip_rect.y - (int)yp;
        if (clip_top >= h) {
            // not visible at all
            return;
        }

        if (clip_top<0){
            clip_top=0;
        }


        // new block for new variables
        {
            // needed now ...
            const KOORD_VAL w = images[n].w;
            xp += images[n].x;

            // clipping at poly lines?
            if (number_of_clips>0) {
                    // FIX: Handle this
                    //display_img_pc<plain>(h, xp, yp, sp);
                    display_img_2dogre_vclip(n, xp, w, yp, h, clip_top, clip_bottom);
                    if (dirty) {
                        mark_rect_dirty_wc(xp, yp, xp + w - 1, yp + h - 1);
                    }
            }
            else {
                // use horizontal clipping or skip it?
                if (xp >= clip_rect.x  &&  xp + w <= clip_rect.xx) {
                    if (dirty) {
                        mark_rect_dirty_nc(xp, yp, xp + w - 1, yp + h - 1);
                    }
                    display_img_2dogre_vclip(n, xp, w, yp, h, clip_top, clip_bottom);
                } else if (xp < clip_rect.xx  &&  xp + w > clip_rect.x) {
                    if (dirty) {
                        mark_rect_dirty_wc(xp, yp, xp + w - 1, yp + h - 1);
                    }
                    display_img_2dogre_vclip(n, xp, w, yp, h, clip_top, clip_bottom);
                }
            }
        }
    }
}

As you can see, at the start it just checks if the texture has already been processed and it's registered. decode_img() does that, after calling it, there is a square texture in NVRAM with the contents of that "imd[n]", it's called "imd_n" with n as the index of the image).

After that, following the previous code I just find if the image needs vertical clipping, because horizontal it's not needed, the 3d renderer will discar those pixels. The vertical clipping it's important to not write over the toolbar nor the bar on the bottom.

And then I just

display_img_2dogre_vclip(n, xp, w, yp, h, clip_top, clip_bottom);

This code is:

Code Select


void display_img_2dogre_vclip(const unsigned n, const KOORD_VAL x, const KOORD_VAL w, const KOORD_VAL y, const KOORD_VAL h,const KOORD_VAL clip_top, const KOORD_VAL clip_bottom ){

    std::string textureName = Ogre::String("imd_"+Ogre::StringConverter::toString(n));

    double x1,x2,y1,y2;
    double tx1,tx2,ty1,ty2;

    double half_w = (double)disp_width/2;
    double half_h = (double)disp_height/2;


    tx1=0;
    tx2=1;
    x1=(double)((x-half_w)*2)/(disp_width);
    x2=(double)(((x+w)-half_w)*2)/(disp_width);

    if (clip_top){
        ty1 = (double) clip_top/h;
        y1=(double)(-((y+clip_top)-half_h)*2)/(disp_height);
    }
    else{
        ty1 = 0;
        y1=(double)(-((y)-half_h)*2)/(disp_height);
    }

    if (clip_bottom){
        ty2 = (double) (h-clip_bottom)/h;
        y2=(double)(-((y+h-clip_bottom)-half_h)*2)/(disp_height);
    }
    else{
        ty2=1;
        y2=(double)(-((y+h)-half_h)*2)/(disp_height);
    }

    ogre2dManager->spriteBltFull(textureName,x1,y1,x2,y2,tx1,ty1,tx2,ty2);
}

This normalizes the coordinates to the [-1,1] coordinate system, and QUEUES the image to be drawn on the next render(). I know some algebra whould simplify those calculations, but that shoudn't be performance punishing at this level now.

x1 is left, y1 is top, x2 is right and y2 is bottom. tx1,tx2,ty1 and ty2 are the offsets of the texture he should use over the geometry.

Each spriteBlted, QUEUES a renderoperation, that renderoperation will take place when renderframe() is called.

And yea, atm is a DECELERATOR.

prissi · February 02, 2012, 07:56:14 PM

I thing the way to go it modifying simview, as there you also get a z-coordinate. You can sent then all stuff to the engine with that z-coordinate and it will dot the display automatically. (Ok, that was naive, but I hope you get my meaning.)

Markohs · February 02, 2012, 08:04:42 PM

Yea, z-ordering plus atlas generation might be the answer to all of this.

Having a look to gorilla, that's claimed to outperform all 2D managers over Ogre, getting some ideas from it's code.

[link]https://github.com/betajaen/gorilla/blob/master/README.md[/link]

Just had a look and pak128 has 14967 images, dunno how much buffers will take to get that into memory, and it's size. If size is a problem we can allways use a texture compression utility, GPU uncompreses and uses them on the fly.

Maybe rendering the GUI in different layers can make things easier too.

As a side note: I don't know if I already said this explicitly but my intentions in this first phase of the project is touching the minimum number of files I can, so I can keep using the current code, including the UI and the core game engine of Simutrans, and both builds can coexist with minimum problems.

Once we have to keep in sync the simutrans model and the 3D world, it will be unavoidable modifying more and more files, and add functions that don't exist in current simutrans. But the UI and the inners and simutrans will remain the same, if I can.

News:

Hardware accelerated display, OpenGL back-end & Simutrans 3D