News:

Simutrans Wiki Manual
The official on-line manual for Simutrans. Read and contribute.

Patch to remove CLIP() macro

Started by ceeac, July 23, 2020, 05:55:30 AM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

ceeac

This patch replaces the CLIP() macro by calls to clamp(). I also moved the clamp() function to the sim:: namespace to prevent confusion with std::clamp and the clamp() function defined in api_param.cc.

kierongreen

Interesting, and I think replacing CLIP with clamp is a good move as functionally they are identical (anyone aware if there's any difference in speed between using min/max and if's?).

Having a separate clamp in api_param though seems to stick out a bit. I notice that version is probably more typesafe, and having int's in the current sim clamp implementation is probably asking for problems sooner or later with different int sizes. Unless there were significant issues with doing so then using the same implementation currently in api_param across the whole code might be an option?

I know there's probably plenty of other places in the code where clamp functionality is implemented via if's - finding all of those (and being sure that there's no overhead even noting that compiler is asked to inline) to see whether those might improve readability.

Similar to the swap discussion I would have a question about whether we want to introduce lots of extra sim:: in the code - although there aren't as many for clamp. Does using clamp without sim:: infront generate issues with C++17/18 standards?

prissi

Personally, I would prefer without sim. But that is probably just one oldtimer, who like C for being a good compromise between verbose and efficient.

And I agree that think one function would be better for the same functionality. I have to admid that I never looked much in the api* files.

kierongreen

Here's my effort - removing CLIP and replacing the previous clamp with the former separate api clamp version.

Done some brief testing but nothing extensive. To avoid mixed signed/unsigned template errors there's a few times where literals have had to be defined as unsigned, and font heights and way height clearances are now defined as unsigned. I've not attempted to identify other sections of code which are functionally equivalent to clamp.

There may be a number of warnings I've not picked up on resulting from this patch - apologies if so.

I've not moved clamp to the sim namespace - that's another discussion to be had.

Also I note that the api clamp version now used across the code is very similar to example implementations in standards documentation, except that those deal with references rather than variables (and use constexpr rather than inline). Not sure of any performance differences resulting from this, and whether we would be better to use references in our implementation? It's only a change in macro.h - code which calls it works with either version.

prissi

Thank you, although I think the gcc C compiler should no emit a warning if 2 is converted to 2u and just do it silently. MSVC warend before much more often about signed/unsigned, but at many more places. So please commit.

kierongreen

#5
Quote from: prissi on July 28, 2020, 05:18:03 AM
Thank you, although I think the gcc C compiler should no emit a warning if 2 is converted to 2u and just do it silently. MSVC warend before much more often about signed/unsigned, but at many more places. So please commit.
Committed in 9174 - thanks ceeac for initial idea. GCC was throwing errors so couldn't compile unless defined the literals as unsigned. There may be warnings elsewhere now I know ceeac in particular is good at spotting them... :)

Edit: found a few min/max pairs where clamp is more readable and committed change in 9175. In the same spirit it might be worth looking at koord clipping as there's a few different approaches used across the code currently.
Edit2: also updated a few references to way height clearance in 9176 now that it's unsigned

prissi

Errors? This seems strange, then const unsigned i=2; must also be an error ... Anyway, thanks.

ceeac

Quote from: prissi on July 28, 2020, 02:30:31 PMthen const unsigned i=2; must also be an error
This is not an error because the value is implicitly converted from signed to unsigned; when doing template substitution, this is not the case because signed and unsigned integers are different types.

Quote from: prissi on July 28, 2020, 05:18:03 AMMSVC warend before much more often about signed/unsigned, but at many more places.
GCC is also capable of this, but the relevant option (-Wsign-conversion) is not enabled by default.

prissi

Quotewhen doing template substitution, this is not the case because signed and unsigned integers are different types.

Moreover, the inplicit conversion works somewhat with templates as well, because if I put in a long type, I do not need "l" at the number, but these are still different types too. Same for char.

But hey, I am not a compiler builder, just a self taught programmer so I have to accept whatever strange things are cooked up.

In case of clamp, using two inline function (one for signed, one for unsigned) would be probably better, since then there would be inplicit conversion of numbers into signed/unsigned and also the use with floats would be explicitely forbidden.

Ters

Signed vs. unsigned and char/short/int/long becomes more of an issue with C++ than in C due to overloading. When there exists both a max(short, short) and a max(int, int), and you have a short val;, which max do you want when you write max(val, 100)? Should val be cast to a int, or 100 to short? I'm not sure if this is the issue you encountered here, though.

prissi

MSVC now complains when it do something like clamp( h, 0, 100) where h is sint8. "No instance of template clmap matches sint, int, int". I think the template approach has some issues, or how should I mark 0 as short!?!

ceeac


clamp<sint8>(h, 0, 100)

This should ensure the compiler warns if the bound does not fit into an sint8.

prissi

It was not a warning, but an error. (And thanks, I know how to fix it.) But I find it stupid that long and int works, but for short not. Especially 0 should neither be short, long, signed or unsigned, because mathematically it is all of it.

Ters

Quote from: prissi on August 08, 2020, 03:28:50 PM
But I find it stupid that long and int works, but for short not.
int works because that makes all three arguments the same type (int), and that matches the template. I'm not sure why long works. At least not without more context. Unless you are referring to why one can add L behind a number to make it a long (and LL for long long), yet there is no similar suffix for short and shorter. That is something that has been missing in C for ages. (Newer C++ allows you to define your own suffixes.)

DrSuperGood

Quote from: Ters on August 08, 2020, 07:06:37 PMUnless you are referring to why one can add L behind a number to make it a long (and LL for long long), yet there is no similar suffix for short and shorter. That is something that has been missing in C for ages. (Newer C++ allows you to define your own suffixes.)
This is likely because int represents the natural size of the platform. Anything smaller than int, hence smaller than the natural size, may incur an instruction or processing overhead due to its size and so should be avoided unless absolutely necessary, such as when dealing with data storage. Data should be converted from byte/short to int, then manipulated as int and then written back as byte/short. Under the hood this is what a lot of the code will compile into doing.

As an example, if an x86 processor were to explicitly load a short (16 bit) value into a register, then it becomes prone to a partial register stall. This is because those instructions do not extend the value to the full register size and as such cause a potential false dependency on the full size register content.

It is my guess that to avoid this kind of problem constants were purposely left as int since that is usually implemented as the type that the processor will execute most efficiently with. In most sane implementations char and short will be a subset of int, hence no special suffix is required for them. The long int and long long int types need a special suffix because they may not be able to be resented by int as they may have a bigger range. The suffix is also a deliberate choice the programmer must make which indicates that manipulations may execute more slowly, a cost they have to accept in order to use the type and its larger range.

This is likely also the reason why basic arithmetic operators on char and short operands produce an int type. This helps assure efficient execution by having the full register content well defined in the native type.

Unlike with the 8 and 16 bit register subset, all 32 bit instructions will automatically discard the upper 32 bits of the 64 bit registers avoiding data decency problems. Hence why with x86-64 the int type usually implies a 32bit type as conversion is not required and any long long int constants will also load and be processed natively.

In short I try to think of this of a quirk of using a reasonably low level language like C/C++ where the language reflects hardware to some extent. I think modern C/C++ provides macros or templates to convert int constants to short or char for convenience in a deliberate way.

prissi

The natural size is a very weak argument. At that time when C was convived and l was introduced, many processors were still using an 8 int data bus, even bigger one like the Mc 68008, while they could often no really hold a 32 bit in a single register but had to combine two.

The short or long problem is nothign which C can be blamed. This is entirely a C++ problem, since overloading, template etc requires exact type matching but for numbers there are only u, l and ll suffixes. I wonder, what would happen, if I encode 100 as char 'd'. No, it says that sint8 is not compatible with char?!? And even casting (sint8)100 it still says that 0 is an int, even though it accepts 0 for any pointer ...

Consequently it should not accept  clamp<sint8>( x, 0, 100 ) without explicit casts. But suddently it can cast 0 to sint8 silently.

Mariculous

#16
The error is a little missleading.
The point why the compiler complains about the clamp(sint8,int,int), is in fact because it is not clear which one to choose.
clamp is defined as T clamp(T,T,T), so clamp(sint8,int,int) doesn't match obviously.
In theory, the compiler has two choices here: Either generate sint8 clamp(sint8,sint8,sint8) or generate int clamp(int,int,int)
Either of these will lead to code that can be called with clamp(sint8,int,int). We know that behavior from C already, so this is nothing new to us.
Luckily the compiler doesn't know this here, so it strictly fails.
Enforcing a type disables compilers type inference here and will simply trust you: "If the user says int is fine, I'll generate the function by using int instead of T."
You can even tell the compiler string is fine and it will use string instead of T.
Obviously, it won't find a suitable clamp that accepts your ints in the next step.

According to how C++ templates work, this behavior is quite consequent. Templates are nothing more than (quite stupid) code generators that will replace the template variables by an actual type as soon as they
a) could uniquely infer a type or
b) the user explicitly defined a type
Do not confuse C++ templates with generics, known from the java world for example. These work much differently.

prissi

Why is 0 an int, and not an open type, since I am allowed to use it as boolean, pointer, or unsigned without a warning? This is the thing I am complaining about. clamp( long, 0, 100 ); works without suffixes. And unsigned need suffixes, ok. But clamp( sint8, 0, (sint8)100 ); DOES NOT. This is outright stupid, and mathematically wrong. 0 is a bolean too, which has only 1 bit, so it fits any data size. Which is the compiler insisting on int here?

Well, but I am only a mere compiler user ...


Mariculous

#18
Quote from: prissi on August 10, 2020, 02:27:17 PMWhy is 0 an int
Because it was defined that way in C. It's a number literal without any suffix, so it's int.
Quote from: prissi on August 10, 2020, 02:27:17 PMsince I am allowed to use it as boolean, pointer, or unsigned without a warning?
That's two different things.
One is template type inference, the other one is autocasting (I am not quite sure if that's the correct term in C++)
The latter in inherited from C and it turned out to quickly cause a lot of trouble if you are not very ceraful.
The first is a C++ thing and it is imho a good decision that it does NOT simply guess some stuff. Either it's unique, in which case it is fine to infer the type, or it's not, in which the user has to specify what he wants the generated code to look like.

I agree, these two behaviors are a little contradictory and in a perfect world, C++ designers would likely have removed autocasting entirely, but that's not possible because that behavior was inherited from C.

Ters

Quote from: DrSuperGood on August 10, 2020, 08:13:30 AMAs an example, if an x86 processor were to explicitly load a short (16 bit) value into a register, then it becomes prone to a partial register stall.
Partial register stalls only became a problem with Pentium, at which point suffixes for short had already been missing for a while. The most likely reason might be that int and short was the same back when these suffixes came into being. C didn't need them as much, due to not having type inference and overloading. Implicit conversions (I think this is the term Freahk was looking for) are unambiguous there. (There might be some issues with non-prototyped and variadic functions on some architectures, but maybe the standard has that covered.) C++ appears to have never bothered, or perhaps they did not want to diverge from C. Casting was always an option for both languages.

Implicit conversion between types is one of those apparent nice-to-haves that actually has some nasty sides to it. C++ apparently realized this early on, and has the explicit keyword to prevent it, although only in some situations that are new in C++.