News:

Simutrans Tools
Know our tools that can help you to create add-ons, install and customize Simutrans.

Make world limits not forced to water level

Started by Markohs, November 16, 2012, 01:39:10 PM

Previous topic - Next topic

0 Members and 5 Guests are viewing this topic.

Markohs

Oh, I see, found the bug, fixing it. Thx. ;)

Markohs

How strange, this bug only appears to show when compiled with mingw , on Visual Studio never fails, dunno how to debug this, I'll try to learn how gdb works... :)

Ters

Sounds like the kind of bug that might go into hiding when compiled in debug mode.

IgorEliezer

Quote from: Markohs on May 10, 2013, 03:33:31 PM
How strange, this bug only appears to show when compiled with mingw , on Visual Studio never fails, dunno how to debug this
What a bugging situation: this is the kind of bug that bugs me up a lot. D*** bugs! :>

Dwachs

Markohs, can you post an updated patch?
Parsley, sage, rosemary, and maggikraut.

Markohs

Quote from: Ters on May 10, 2013, 04:08:37 PM
Sounds like the kind of bug that might go into hiding when compiled in debug mode.

Yes, thought on taht too, but mscv on release produces a binary that doens't fail either.

I tried to debug using mingw and gdb, it's impossible to hit control-C to stop the program on bug (the bug causes invinite loop). To debug it, I have to start the program manually and attaching to the program using "gdb --pid=9700" externally. This works, and the proigram stops execution, but when I try to debug, stepping doesn't take me out of internal functions.

It's strange, dunno how can I debug this. I hope it's just my way of compiling it that's causing the bug... I don't know how someone can develop and debug using mingw, it's horrible. It's way slower compilating, the console doesn't allow copy/paste...

I understand gcc is multi-plattform and tools are great on a linux machine, but in Windows it just sucks. ;)



Posting a patch now, Dwachs, gimme a few mins. Thx. :)

Markohs

https://dl.dropboxusercontent.com/u/30024783/newbg.patch

You also need ground.Background.pak in your pak directory, you can get inside this file:

https://dl.dropboxusercontent.com/u/30024783/bg/background.zip

To reproduce the bug, you need to move 100% outside the world limits and zoomin/zoomout continously with the mouse wheel, if it doesn't fail after a few secs, move the camera a bit and keep doing it. It's a bit random.

I know the code it's not ready for inclusion and needs some tweaking, but why does this bug exist?

I compiled it without MULTI_THREAD, btw.

TurfIt

#287
The problem appears to be the chosen image - zoom in and once recoded, run lengths > 512 are being created. That exceeds the max for the asm version of display_img_nc(). The recoding routines should be changed to respect the limit. _nc() is too performance critical to change IMHO.

EDIT: also, if I open any gui window (nowhere near the background), the background is being continually painted..

Ters

Is it still that performance critical? All indications I have seen lately is that Simutrans is I/O bound, not CPU bound. Potential low performance platforms today are likely non-x86 anyway.

But how can one recode differently? The run lengths are alternately transparent (skip) and non-transparent (copy). A zero length transparent run (except the first) is a marker for end of line. It is therefore not possible to split a non-transparent run into two runs without altering the image to insert a single transparent pixel between them. Or have I missed something?

Dwachs

As Turfit said, only the assembler code assumes runlengths<512. Hence only the assembler code needs fixing...
Parsley, sage, rosemary, and maggikraut.

Ters

The way I understood TurfIt, the assembly must remain unchanged, and a maximum non-transparent run length less than 512 pixels enforced.

prissi

Most places in Simutrans is not too CPU critical; yes but unfourtunetly exactly this routine is where simutrans spend (in assembler form) about 20% of its time. This this is indeed some of the place which is. Allowing for larger length will mean that less of the code fits into the caache, and might reduce the gain of the assmbler code. (This needs to be checked of course, it is just a feeling.)

However, I fail to see the need of images larger than 500 pixels. Less than three of those will fit on a normal screen side by side!

Ters

Quote from: prissi on May 11, 2013, 02:41:28 PM
However, I fail to see the need of images larger than 500 pixels. Less than three of those will fit on a normal screen side by side!

But images less than 500 pixels wide become more than 500 pixels wide when zooming in.

TurfIt

display_img_nc() is *the* CPU hog affecting framerate. Many areas of Simutrans have the CPUs stalled waiting on memory access, this isn't one of them.
timing:

sync display
code pathms/frame
asm16.7
USE_C17.5
USE_C ALIGN18.2
USE_C WHILE17.7
NOTE: this is the total time spent in sync_step display - the % difference in just display_img_nc() is obviously much greater. 

asm is the default when compiled 32-bit on GCC.
USE_C is presumably close to what MSVC gets ( I need to still use ALIGN_COPY as display_fb_internal() USE_C code makes unwarranted assumptions that the GCC optimizer trips over. USE_C path in display_img_nc() is ok, but still normally forced to ALIGN_COPY for some reason, I bypassed the force for this...)
USE_C ALIGN_COPY is what you get on 64-bit  (or 32-bit GCC if explicitly told to USE_C)
USE_C WHILE is me replacing memcpy in the ALIGN_COPY with a simple while instead. memcpy will not inline when placed there, and function call overhead is huge in that loop. The while loops ends up vectorized to SSE2 instructions by the optimizer.

The vast majority of runlengths are <=4, and almost none > 12. I'm sure that's why this totally unrolled loop of MOVSDs works so well (I actually get a further speedup with MOVSWs!), and the compliers tendancy to try for larger moves is counter productive - the MOVDQAs from the while loop.

IMHO, the fix should be for the coding routines to check the runlen limit, and error there instead of the crash in _nc(). _nc() works good for the typical Simutrans graphics, so for the background display, create new display code in simgraph designed for long runlens leaving the existing alone.

Markohs

I'll say another of my crazy ideas, but...

Whoudn't be even better to have the image loaded uncompressed in-memory (without rle) and use SSE/MMX routines to do all the display?

This can be implemented in various ways, we can leave the image compressed to save memory, but on first usage, we just uncompress it and work with the uncompressed image instead.

TurfIt

Doesn't sound crazy to me...  ;D
Any idea what the impact on memory requirements would be?
And there's the risk of turning back from CPU bound to memory bandwidth bound...

Ters

For my current OpenGL project, I have uncompressed images. For pak64, this occupies 96 MB, although there is a lot of empty space, both because I round the texture up to a power-of-two and because I need to have all images at full width and height. Before I had to add the empty space (but still power-of-two), I think it was 24 MB.

This is however unscaled images. With images scaled up when zoomed in, the size will quickly become quite huge. Even more so with pak128 or pak192. There might still be Simutrans players with 1 GB RAM or less. These also have limited, or perhaps no, SSE. MMX might be universally available by now.

prissi

The RLE is actually not a compression, rather an optimisation. It just skips. Faster than not accessing display at all is not possible. Moreover, the images are RGB 555. I doubt that this can get much speedup using MMX commands.

During the last round of optimisation we tried MMX code. But it was slower, maybe due to the fact it needs at least 64 bytes or it is slower. With the 8-24 mentioned above it did not speed up.

If you blow up most images to tilesize x tilesize (and ignore that there are some larger ones), you will end up with images that are about 4 to 6 times larger (for vehicles) or 2x larger houses, so much more cache misses. I am also curious how you would implement the transparency there. But here I confer I might have completely misunderstood what you what to do.

Markohs

We can allways implement some kind of uncompressed images memory collector, that every 2-4 mins, deletes uncompressed images that have not been used.

Ters

Quote from: prissi on May 11, 2013, 08:04:23 PM
During the last round of optimisation we tried MMX code. But it was slower, maybe due to the fact it needs at least 64 bytes or it is slower. With the 8-24 mentioned above it did not speed up.

This code might be in some kind of sweet spot where whatever speed-ups one can get from rep movsw or MMX simply goes to waste in the logic needed for starting up and winding down. Since both at their core move relatively big chunks of data (according to an Intel manual I found, rep movsw actually ends up copying bigger pieces of data at a time than the w suggests), and the actual data is not a whole number of such chunks nor chunk-aligned, they need to start and end with some smaller moves. The Intel manual suggest rep movs would be best for almost a hundred elements or more.

The question is how much the loop would suffer from another if delegating runs of more than 512 pixels to less optimized, but more capable copy code. Or maybe the recoding could flag troublesome images, so that the switching between optimized and more capable code is done just once outside the loops. register_image also scans through all loaded and generated images and so could set this flag should an image have too long runs even from the start.

Quote from: prissi on May 11, 2013, 08:04:23 PM
If you blow up most images to tilesize x tilesize (and ignore that there are some larger ones), you will end up with images that are about 4 to 6 times larger (for vehicles) or 2x larger houses, so much more cache misses. I am also curious how you would implement the transparency there. But here I confer I might have completely misunderstood what you what to do.

There is another entire topic devoted to what I'm up to.

prissi

Back to the original topic: I submitted a patch which uses the existing slope for an earth border. It can be disabled/enabled via simuconf.tab. Same for tiling.

Markohs

#301
No background image nor world slope. Well, I guess I can consider my project ends here then. :)

btw, it has a bug, the routines in simview won't draw the borders of the tile that don't have their top visible:




Scroll sightly down and:


TurfIt

Bugs not withstanding, I don't see this as necessarily the end of your project. Simply the option to make things look more like they used to was added with the old ground outside tiling.

The background image I'm not keen on, but as an option why not...
And if by world slope you mean that simcity4 look, I'd very much still like that as an option.

kierongreen

I like the effect of having earth borders. These are slightly broken for double height tiles at present though so will look at that in morning.

kierongreen

Fixed display_border for DOUBLE_GROUNDS


void grund_t::display_border( sint16 xpos, sint16 ypos, const sint16 raster_tile_width )
{
if(  pos.z < welt->get_grundwasser()  ) {
// we do not display below water (yet)
return;
}

const sint16 hgt_step = tile_raster_scale_y( TILE_HEIGHT_STEP, raster_tile_width);

// fixme for double slopes!
static sint8 lookup_hgt[5] = { 6, 3, 0, 1, 2 };

if(  pos.y-welt->get_size().y+1 == 0  ) {
// move slopes to front of tile
sint16 x = xpos - raster_tile_width/2;
sint16 y = ypos + raster_tile_width/4 + (pos.z-welt->get_grundwasser())*hgt_step;
// left side border
sint16 diff = corner1(slope)-corner2(slope);
image_id slope_img = grund_besch_t::slopes->get_bild( lookup_hgt[ 2+diff ]+11 );
#ifndef DOUBLE_GROUNDS
if(  diff  ) {
diff = abs(diff)-1;
}
else if(  corner2(slope)  ) {
// no difference but a slope at the front => need an extra flat height step
diff = -corner2(slope);
}
diff ++;
// ok, now we have the height; since the slopes may end with a fence they are drawn in reverse order
for(  sint16 zz = pos.z-welt->get_grundwasser();  diff <= zz;  diff ++  ) {
display_normal( grund_besch_t::slopes->get_bild(15), x, y, 0, true, false );
y -= hgt_step;
}
#else
diff = -min(corner1(slope),corner2(slope));
sint16 zz = pos.z-welt->get_grundwasser();
if(  diff < zz && ((zz-diff)&1)==1  ) {
display_normal( grund_besch_t::slopes->get_bild(15), x, y, 0, true, false );
y -= hgt_step;
diff++;
}
// ok, now we have the height; since the slopes may end with a fence they are drawn in reverse order
while(  diff < zz  ) {
display_normal( grund_besch_t::slopes->get_bild(19), x, y, 0, true, false );
y -= hgt_step*2;
diff+=2;
}
#endif
display_normal( slope_img, x, y, 0, true, false );
}

if(  pos.x-welt->get_size().x+1 == 0  ) {
// move slopes to front of tile
sint16 x = xpos + raster_tile_width/2;
sint16 y = ypos + raster_tile_width/4 + (pos.z-welt->get_grundwasser())*hgt_step;
// right side border
sint16 diff = corner2(slope)-corner3(slope);
image_id slope_img = grund_besch_t::slopes->get_bild( lookup_hgt[ 2+diff ] );
#ifndef DOUBLE_GROUNDS
if(  diff  ) {
diff = abs(diff)-1;
}
else if(  corner2(slope)  ) {
// no difference but a slope at the front => need an extra flat height step
diff = -corner2(slope);
}
diff ++;
// ok, now we have the height; since the slopes may end with a fence they are drawn in reverse order
for(  sint16 zz = pos.z-welt->get_grundwasser();  diff <= zz;  diff ++  ) {
display_normal( grund_besch_t::slopes->get_bild(4), x, y, 0, true, false );
y -= hgt_step;
}
#else
diff = -min(corner2(slope),corner3(slope));
sint16 zz = pos.z-welt->get_grundwasser();
if(  diff < zz && ((zz-diff)&1)==1  ) {
display_normal( grund_besch_t::slopes->get_bild(4), x, y, 0, true, false );
y -= hgt_step;
diff++;
}
// ok, now we have the height; since the slopes may end with a fence they are drawn in reverse order
while(  diff < zz  ) {
display_normal( grund_besch_t::slopes->get_bild(8), x, y, 0, true, false );
y -= hgt_step*2;
diff+=2;
}
#endif
display_normal( slope_img, x, y, 0, true, false );
}
}

prissi

#305
Thank kieron, submitted.

And personally I would not mind for a proper earth cut. I played a little around with graphics, but nothing I had was looking nicely. That could be entire due to my skills.

And without you this whole thing would not have taken off at all. Be proud of it!

Markohs

mmm... I'll focus earth cut then, but I think I'll go back to the background image in the future maybe, drawing outside with a outside tiled ground-shaped image it's too limiting I think.

What I'm unsure is if implementing the ground cut as a list of height-color , and generate automatically the slopes on init like the climate code does or requiring the pak creators to supply a cut image.

I'd go for the list first, I guess, it's easy to implement, and I'm not really sure if our artists want to create the slopes manually.

kierongreen

To be honest textured cliffs and artificial slopes is probably the way things should be headed.

Markohs

You mean generating the world slope in a similar way the climate code does? Requiring the pak creator to supply textures and generate a vertical stripe to use as border? If that's what you are suggesting, I can agree with you.

You can also make that list of heights, to refer to a solid color *or* texture.

Markohs

#309
Elaborating this a bit more:

The slope *NEEDS* to be dynamic if it's to be implemnted, because the user can change water level and maximum height, so just allowing a variation of the sliced slope it won't be enpough, since we need to generate a full slope on map creation, I think. Also underwater slope needs graphics too, since from grundwasser (defaults to -2), to min world height level, (-10 or so), we need to show some seabed graphics,  to show variations on ocean level (comes to my mind the waters closer to the water level will be lighter blue, and deeper waters, will have to look more dark.

Have a look at simcity example, the one I planned to look similar to, and maybe my explanations make more sense. :)

Sometimes I'm very bad explaining myself, I'm sorry. ;)



Also examples of what I implemented so far in my tests, solid color:



And gradient (needs work, still, doesn't implement ocean)



About graphics for the non-world zone, I can just use prissi's implemented algorithm but at min_world_level, and not un water_level, now. It will look good enough. Even I'd like some day in the future to re-visit background image, but I'm not sure since it wasn't very welcome, even through I liked An_dz idea very much:



What I'd also like is your double height patch implemented in trunk already, so I don't need to write two versions of this and just worry about the new situation.

kierongreen

Yep applying a texture to slopes is what I had in mind, with different textures for different depths and transitions between them. You wouldn't need an option for solid colours as you could use solid colour textures.

Markohs

Well, implementing both should be no problem I guess, since I can write generic code that interpolates two colors, and in the case of solid color I just get a fixed number as a parameter, and in the texture case, just the pixel that's in the corresponding position. I *think* it's easy to do.

IgorEliezer

Quote from: Markohs on May 13, 2013, 11:14:55 AM
(...)even through I liked An_dz idea very much:
*pic*
What if instead of a "square" grid it were a diamond grid? Like the tiles in the game.

Markohs

Well, the thing is it can be whatever the artist wants to be, that's good part of that implementation. I just supplied the chance for the artist to design whatever he wants to be.

prissi

You can get the diamond grid already using outside tiles and switching them on in you pakset.