News:

Simutrans Chat Room
Where cool people of Simutrans can meet up.

integer type usage choices and implicit narrowing conversions

Started by neroden, April 05, 2010, 10:44:10 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

neroden

OK, I ran across this trying to figure out the source of some of the signed-unsigned comparison warnings.

There are some nasty unwarned integer type narrowings in the code.  I'm turning on -Wconversion to start seeing them all.

....But anyway, the koord3d class has sint16 values for X and Y.  Great.  The koord_distance function, however, returns *uint32*.

I don't know whether X and Y coords are actually allowed to be negative.  If they are, the maximum value for the koord_distance function (which gives Manhattan distance) is 2^17 - 2.  If they're not, the maximum value are 2^16-1, which would actually fit in uint16.

Now this isn't a problem except that there are quite a lot of other things which use koord_distance and stuff it into a narrower type.  If it's actually giving out values up to 2^17 -2, this is *BAD* as it can cause garbage results, and all the functions which use it should either use wider types or do explicit bounds checks before narrowing.  If it's actually only giving results up to 2^16-1, it should have a return type of uint16.

Which is it?  I'll start supplying patches to fix the implicit type narrowings if I know which is correct.

prissi

Practically koord should not go much into the minus range, i.e. below -5 mximum lets say. Moreover, typical maps should be less than 16000x16000 tiles large, or other strange effects will appear with pathfinder (and no memory will be left) and so on. Thus in pricinple an unit16 might do the trick, although a unit32 is more safe. So older places use abs(k.x-p.x)+... instead of koord_distance(), so if you go through the code, you may want to change those too.

Also there are some places where the square of koord_distance is used. This works only, when it is uint32. So lots of work ahead for you. But if everything goes well, you might find also errors/questionable code.

On signed/unsigned and type conversion warnings: Please note that those are compiler and OS specific! Even fixing everything for GCC 4.x on Linux will give lots of warning on GCC 3.x on Mingw or 2.95.y on BeOS. Not to mention the MSVC and BeOS cc warnings. Even more since some architekture have char signed while other have them unsigned. Passing unsinged char str[20] to printf is then an error.

jamespetts

If bad things happen if maps are bigger than a certain size, ought the UI not prevent maps of above that size from being created at all?
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

prissi

I actually does prevent maps with more than 8000+8000 tiles.

jamespetts

Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

neroden

Quote from: prissi on April 06, 2010, 08:57:37 AM
Practically koord should not go much into the minus range, i.e. below -5 mximum lets say.

Hmm.  How do the x and y values ever go negative?  Is this due to the player clicking off the edge of the map?  Or due to computations based on other koords (as in "take this valid koord and move 5 tiles to the west")?

I understand that the z values go negative.

Quote
Also there are some places where the square of koord_distance is used. This works only, when it is uint32. So lots of work ahead for you. But if everything goes well, you might find also errors/questionable code.

QuoteOn signed/unsigned and type conversion warnings: Please note that those are compiler and OS specific! Even fixing everything for GCC 4.x on Linux will give lots of warning on GCC 3.x on Mingw or 2.95.y on BeOS. Not to mention the MSVC and BeOS cc warnings. Even more since some architekture have char signed while other have them unsigned. Passing unsinged char str[20] to printf is then an error.
Well, I've really been trying to fix the ones which are using "uint16", "uint32", "sint16", "sint32" and so forth, which should be architecture-independent.

The ones using "char" have to be left alone.  The ones using "int" I'll ask about, in hopes of converting some of them to "uint16" and so forth.

The sprintf formats are another source of warnings.  Is "long long" always 64 bits?  If so, I can remove most of those warnings in an architecture-independent fashion.  If not, well, I can't, and I'll leave them alone.

Quote from: prissi on April 06, 2010, 11:16:55 AM
I actually does prevent maps with more than 8000+8000 tiles.

I'll keep this in mind!

Thanks very much for the feedback.

EDIT: Attached is a single patch which eliminates a number of warnings, based on sticking with uint32.  I'm not sure whether this is the right thing to do; we could also switch it all to uint16 quite safely, given that the maximum actual distance should be about (8000 - -5) + (8000 - -5) = 16010.  But uint32 would leave room for future enlargement (on machines with more memory, with a faster pathfinder, etc.)  On the other hand it uses a little bit more memory.  I can make an alternate version based on uint16 if you prefer.

neroden

OK, this is another patch.  There's nothing wrong with the existing code, but I'm trying to eliminate repeated unimportant warnings from -Wconversion so I can find out which ones are *real* issues.  This one, from simtypes.h, shows up many many times in a typical compilation, and it's straightforward enough to eliminate (once I remembered the integer promotion rules).  It should compile exactly the same.  I switched from C-style casts to appropriate use of reinterpret_cast and static_cast while I was at it.

(Also in my git repo at commit 2165ae12bd7cfe8a5ab2382329f539c544540016 -- git users, you probably don't want to pull the whole typesafety branch as it turns on -Wconversion and the slew of warnings).

EDIT: FYI, if you don't care about machines with "int" less than 16 bits (I doubt simutrans compiles on any, I was just being very very portable), you can eliminate the inner two static_casts; they're already operating on "int"s thanks to the integer promotion rules.

prissi

Apart from your code being much more difficult to read, actually what warning does it fix?

const uint8 *data = (const uint8 *)d;
return ((uint16)data[0]) | ( ((uint16)data[1]) << 8 );

The only possible one, I could think off, is a singed unsinged conversion warning, which could be fixed using 8u instead 8. (But this would not adhere to K&R!)

The types should be like this: type(x) == type(x<<y) or else the compiler needs to be beaten.

neroden

Quote from: prissi on April 13, 2010, 09:00:45 AM
Apart from your code being much more difficult to read, actually what warning does it fix?

const uint8 *data = (const uint8 *)d;
return ((uint16)data[0]) | ( ((uint16)data[1]) << 8 );

The only possible one, I could think off, is a singed unsinged conversion warning, which could be fixed using 8u instead 8. (But this would not adhere to K&R!)

The types should be like this: type(x) == type(x<<y) or else the compiler needs to be beaten.
Wrong.

Look up the specification for the integer promotion rules in the C and C++ standards.  *All computations are done in int* if int is wider than the integral type being used.  This includes bitwise or, | .

So if type(x) is short and type(y) is short, type(x | y) is *int* and this is *required by the C and C++ standard* and will be done by *any modern compiler*.  Accordingly if you try to stuff x | y back into a short you will get a warning about possible data loss if you turn on type conversion warnings.

I'm not kidding, and I know it's bizarre.  But *that is how C and C++ are specified*.

Accordingly, it's also worth noting that two of the casts in your existing code do *nothing* unless "int" is less than 16 bits.
return ((uint16)data[0]) | ( ((uint16)data[1]) << 8 );

data[0] | data[1] is a computation done in *int*.  data[1] << 8 is a computation done in *int*.  The casts to uint16 do *nothing*.

I actually verified this behavior by making a toy program which did, basically, this:

const uint8 *data = (const uint8 *)d;
return data[0] | ( data[1] << 8 );

This will give the warning about conversion from "int" to "short" on the return.  It will also give the correct results.

You may not have known how the integer promotion rules in C and C++ actually work.  A lot of people don't.  But the fact is, this is how they work, and every compiler produced since the C89 standard is going to behave this way.  I have already asked and verified that simutrans is not actually compiled on any ancient K&R-only systems.  (And in fact, most K&R systems also worked this way, as they are permitted though not obligated to promote all computations to "int".)

EDIT: if you want to beat anyone, beat the committee who designed the C89 standard!   ;)  The compilers are doing the right thing according to the standard.

EDIT: If you're curious as to *why* the rules say that all computation is done in "int" or larger sizes, it's because back in the days of K&R, "int" was defined as "the size of integer which it is easiest for the CPU to compute with".  Instead of forcing the computer to compute with smaller integers, the K&R specification said it could do all computations with this "easiest" size.

EDIT: You didn't ask, but one of the more annoying things about the C rules is that if you *really* want to do 8-bit or 16-bit math, without wasting register space, on a computer with 32-bit ints, you have to write in assembly language.

prissi

When I learned C (it was before C89 though) it was definitely implementation specific.
char c=1;
int i = (c<<8);
did usually return zero on 3 out of 4 compilers (including early GCC. Did not check on the haiku 2.95.X though ... ) And back in my 68000 days I frequently looked at the compiler output in the debugger, and the code did rarely a sign/unsigned extetsion (which is very costly on 68k). But I learn new stuff every day (albeit this is really braindead). *sigh* (On the other hand 40 bit integer and 20 bit chars are nowadays even rare on DSPs and will have hopefully never to compile simutrans.)

By the way, it seems the implementation of that is even not 100%: http://www.lysator.liu.se/c/rat/c2.html

Quote
Accordingly, it's also worth noting that two of the casts in your existing code do *nothing* unless "int" is less than 16 bits.
Code:
return ((uint16)data[0]) | ( ((uint16)data[1]) << 8 );
This is not true; some compiler might do a signed shift otherwise (see above link), which usually is slower (although this more important for right shifts, of course).

BTW: what platform are you compiling on that you are not using intel byteorder. PPC?

Edit2:
-Wconversion seems not to work for Mingw or Haiku ...

neroden

Quote from: prissi on April 13, 2010, 08:25:12 PM
When I learned C (it was before C89 though) it was definitely implementation specific.
char c=1;
int i = (c<<8);
did usually return zero on 3 out of 4 compilers (including early GCC. Did not check on the haiku 2.95.X though ... ) And back in my 68000 days I frequently looked at the compiler output in the debugger, and the code did rarely a sign/unsigned extetsion (which is very costly on 68k). But I learn new stuff every day (albeit this is really braindead).
Yeah, isn't it?  Makes you wish for ALGOL-60, eh?

Quote*sigh* (On the other hand 40 bit integer and 20 bit chars are nowadays even rare on DSPs and will have hopefully never to compile simutrans.)

By the way, it seems the implementation of that is even not 100%: http://www.lysator.liu.se/c/rat/c2.html
Yeah.  Pre-C89 compilers are still floating around with unsigned-preserving semantics, though I'm pretty sure simutrans isn't compiled on any of them (unless the BeOS system compiler is like that).

QuoteThis is not true; some compiler might do a signed shift otherwise (see above link), which usually is slower (although this more important for right shifts, of course).

Ugh, you're right of course, it would do a signed shift... although a signed shift in *int*, so it only gives bad results if int is 16 bits.  I got tripped up on that though.  (EDIT: if you need to support 16-bit platforms, you *do* need a cast on data[1] in order to prevent the shift operator from being a signed shift.)  I believe the cast on data[0] really doesn't do anything, since bitwise or is the same on signed and unsigned types.

(This sort of stuff is why I've been looking into integer wrapper templates for my own C++ code.)

QuoteBTW: what platform are you compiling on that you are not using intel byteorder. PPC?

Good catch.  :)  Actually, although I do compile for PPC sometimes, I'm compiling for Intel and the code is never compiled... but it is parsed.

Yes, there's a bug in GCC 4.4's warning system.  :P The system version of GCC 4.4 does some very interesting things, including parsing and type-tracking for code which is destroyed by the preprocessor.  (The bad side-effects of an "integrated preprocessor".)  It still compiles it correctly.

I figured since it does get compiled for big-endian systems it was easier to suppress the warning than to try to dig up a version of GCC which doesn't do this.

The nex

QuoteEdit2:
-Wconversion seems not to work for Mingw or Haiku ...
Prior to gcc 4.3, -Wconversion meant something *completely different*.  It's a tool which wasn't available before.

--

In regards to how "pretty" the code is, if you don't like C++-style casts (I'm rather fond of them), C-style casts are fine for warning suppression.  But since this is C++ I figured the tiny amount of extra typesafety didn't hurt.

neroden

OK, so I've been pursuing the integer conversion problem further.

One of the biggest causes of implicit narrowing conversions is the conversions between KOORD_VAL (which is a 'short'), koord.(x,y), and 'int'.

This appears to be due to uses of the koord type for *screen coordinates*, as opposed to map coordinates.  Though I could be wrong.

I think these types should probably be sorted out properly.  What should be used for what?

prissi


neroden

Quote from: prissi on May 13, 2010, 07:28:35 PM
simgraph16 uses KOORD_VAL which is sint16 ...

Except when it uses 'int'....

EDIT: Shall I switch all 'screen coordinate' stuff to KOORD_VAL?  That might be bad for people with enormous screens, but it would also make it easier to change the size of KOORD_VAL later for enormous screens.

prissi

The usual answer is to try this out. I think some calculations should be done as sint32 whenever a screen coordinate is converted into an offset. There might be some errors around.

However, changing KOORD_VAL alone would not be enough, since all dialoges uses koord() for GUI elements.

jamespetts

Neroden - how enormous? I have a monitor with a resolution of 1200 x 1600...
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

neroden

Quote from: prissi on May 13, 2010, 09:11:04 PM
The usual answer is to try this out. I think some calculations should be done as sint32 whenever a screen coordinate is converted into an offset. There might be some errors around.

However, changing KOORD_VAL alone would not be enough, since all dialoges uses koord() for GUI elements.
I can split koord() into two different types, screen_koord and map_koord.  Actually, at least one of them would keep the name koord (which one?).  But in fact this doesn't cause problems at the moment because koord() uses sint16.  The problem comes in the KOORD_VAL/int32 conversions.

Quote from: jamespetts on May 13, 2010, 10:26:12 PM
Neroden - how enormous? I have a monitor with a resolution of 1200 x 1600...
More than 33,000 on a side.  :)  So 30 times as large as your monitor.  :)

Dwachs

Then use koord for the map-coordinates, and something like koord32 for others. Could be achieved with a template as well

template<class some_int> koord_base {
         some_int x,y
};

typedef koord koord_base<sint16>;


Another thing that comes to my mind are the min/max functions. They take int's and return int's. Why not define macros here instead?
Parsley, sage, rosemary, and maggikraut.

neroden

Quote from: Dwachs on May 14, 2010, 05:23:19 AM
Then use koord for the map-coordinates, and something like koord32 for others. Could be achieved with a template as well

template<class some_int> koord_base {
         some_int x,y
};

typedef koord koord_base<sint16>;


Another thing that comes to my mind are the min/max functions. They take int's and return int's. Why not define macros here instead?

I was actually planning to define them as template functions -- T min(T x, T y);  I've already found quite a lot of suspicious integer conversions due to the min/max functions.

Unfortunately, the biggest cause of code trying to do stuff like sint16 = min(uint16, sint32) and other non-matching types is the confusion between int and KOORD_VAL.  Therefore I thought I should fix that first before fixing min and max.

TrainMith

Quote from: neroden on May 14, 2010, 05:53:51 AM
Quote from: Dwachs on May 14, 2010, 05:23:19 AM
Then use koord for the map-coordinates, and something like koord32 for others. Could be achieved with a template as well

template<class some_int> koord_base {
         some_int x,y
};

typedef koord koord_base<sint16>;


Another thing that comes to my mind are the min/max functions. They take int's and return int's. Why not define macros here instead?
I was actually planning to define them as template functions -- T min(T x, T y);  I've already found quite a lot of suspicious integer conversions due to the min/max functions.

Sorry for the late reply here; I just happened upon the topic.
C++98's standard library already has a templated version with the correct return type, known as std::max and std::min, found within the cmath header file. 
And instead of defining the error-prone macros, I would have suggested inline templated functions if there wasn't already a std lib solution.  Other than a couple of very rare situations, the C++ language supported solution would seem to be the better of the two? 

Dwachs

another related question: Is it possible to find per compiler warning things like this:

sint32 = uint16 - uint16

That is, two unsigned integers are subtracted and put into a *larger* signed one. Although the sint32 can carry differences between uint16's, the point is, that the difference uint16-uint16 is of uint16 type, thus yielding an overflow if converted to sint32.
Parsley, sage, rosemary, and maggikraut.

neroden

Quote from: TrainMith on June 15, 2010, 09:11:28 AM
I was actually planning to define them as template functions -- T min(T x, T y);  I've already found quite a lot of suspicious integer conversions due to the min/max functions.


Sorry for the late reply here; I just happened upon the topic.
C++98's standard library already has a templated version with the correct return type, known as std::max and std::min, found within the cmath header file. 
And instead of defining the error-prone macros, I would have suggested inline templated functions if there wasn't already a std lib solution.  Other than a couple of very rare situations, the C++ language supported solution would seem to be the better of the two? 

Sorry for the VERY late time getting back to this.  I'd love to use the C++98 Standard Library solution, but prissi would have to approve it.  I *believe* all the platforms Simutrans compiles on support C++98 standard libraries, but it's not my decision whether to require C++98 standard library support.

I'll repeat what I said earlier though:
"Unfortunately, the biggest cause of code trying to do stuff like sint16 = min(uint16, sint32) and other non-matching types is the confusion between int and KOORD_VAL.  Therefore I thought I should fix that first before fixing min and max."

Unfortunately I haven't had time to work on that.  :-P  Others are welcome to work on it of course....