News:

Simutrans Wiki Manual
The official on-line manual for Simutrans. Read and contribute.

Re: While coding C++ how do you usually test your expressions and functions?

Started by sdog, November 25, 2016, 06:24:59 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

sdog

Quote from: Combuijs on December 18, 2016, 07:39:44 PM
You refer to the famous "go to statement considered harmful" article, I suppose?
Probably. I think I briefly read it a few years back.

In F77 computed GO TO statements were deprecated,
these had the form GO TO (L_1 .. L_M) N
for a jump to label L_N for a 1<=N<=M.

We had a library routine in our code that used that used these and GOTO as its only control structures. That thing was super dense, one or two hundred lines of code for a stiff ordinary differential equation solver (ODE). Certainly not spaghetti code, but state of the art correct, reliable, and efficient. Written a year or two before I was born. Didn't matter that more than 30 years passed, I still felt effin spoilt by it. The best comes last, it was still good enough to compete with new ODE solvers, some written around 2000 or so.

DrSuperGood

The problem with goto statements in general is that they are as good as redundant. In a language like C or C++ there is as good as nothing a while/do while/for/if/switch with break cannot do that goto can. A lot of the code optimizations involving goto are automatically applied by modern compilers.

Quote
The best comes last, it was still good enough to compete with new ODE solvers, some written around 2000 or so.
Pre 2000 pretty much the only thing that was improving with computers was speed and quantities. More memory, more storage, more clocks per second etc. As such old algorithms developed a decade or so earlier still scaled well as they could take advantage of all the mores available. Post 2000 things have got a lot more tricky as more is no longer really an option as processor clock speed growth has near flat lined while memory and storage have reached such volumes they are no longer much of a concern. To get around this post 2000 hardware has introduced a lot of complexities such as multiple cores, new instructions to compute many values at once, more registers and in some cases even access to dedicated hardware like standardised GPUs to outsource certain types of massively parallel calculations.

If the code was very high level then some of the features, such as new instructions, might be used automatically. However if not then it will not be able to take advantage of such available features and so perform similar to how it did around 2005 odd. The API used might not be suited to take full advantage of modern hardware, limiting the maximum obtainable performance. An old API might take in only 1 value, process it linearly and produce a single output where as a more modern API might support multiple value inputs, process them in parallel and produce multiple outputs. Both API and implementation are almost certainly a lot more complex than something from 1990 however its performance might end up being orders of magnitude faster today.

However you probably already knew that. I am just putting this here in case anyone wondered why such things might no longer be good enough to compete in this day and age.

Ters

Quote from: DrSuperGood on December 18, 2016, 09:27:08 PM
The problem with goto statements in general is that they are as good as redundant. In a language like C or C++ there is as good as nothing a while/do while/for/if/switch with break cannot do that goto can. A lot of the code optimizations involving goto are automatically applied by modern compilers.

I'm not sure that's the "popular" problem with goto statements. There are certainly those who would argue that break statements are at least almost as evil (except probably in switch blocks, where maybe not using them is seen as evil), and that the same is true for multiple return statements in a function.

sdog

QuoteIf the code was very high level then some of the features, such as new instructions, might be used automatically. ...

With scientific Fortran code that is very much a question of the compiler. Ie, such things tend to run quite a bit faster when compiled with the Intel fortran compiler, stipulating Intel hardware. 2000ish is already something pretty recent. Speed never bothered me much in my stuff so i kept to my local cluster (only gcc) and not on remote machines with intel compilers. A friend tested it, compiled with intel they were about 5 or 10 percent faster. For test problems he could get larger differences. In contrast, a smart choice of model parameters made several magnitudes difference in runtime. (In one case, reducing the size of the vector space, I went from about 1 Gs to half a mega second, calculated based on scaling time for one operation with the increased complexity).

Solving ODE are old problems, when hardware limitations didn't prevent the best algorithms to be used, it seems there wasn't much to develop with regards to performance and benign problems. The advance came with treating stiff problems and tackling more difficult problems.

These algorithms look deceptively much like a simple Adams-Bashforth, predcitor-corrector and such things from a basic course in numerical methods, with the addition of 8 pages of equations for some weird cases, referencing a dozen other articles dealing with other special cases.

The thing is, one might just re-write that old code in modern Fortran. Or perhaps even C. But there's nothing to be gained. On the flipside, one loses reliability, that stuff was scrutinised by perhaps 8 generations of grad students and post docs. It is archaic and awful to read (today), but as long as it compiles with state of the art compilers, its as fast as it can be.

@Ters
Quote
I'm not sure that's the "popular" problem with goto statements. There are certainly those who would argue that break statements are at least almost as evil (except probably in switch blocks, where maybe not using them is seen as evil), and that the same is true for multiple return statements in a function.
I understood that the biggest problem is the tendency to spaghetti code. I mean, that anti goto article was from the early 70s? The debate if structured programming with semantic control structures like loops and conditionals is fought.

Are multiple return statements really that bad?

This (py) code:

def heavyside(a):
   if a > 0:
     return 1
   else:
     return 0

looks clearer to me than that one:

def heavyside'(a):
   if a > 0:
      b = 1
   else:
      b = 0
   return b


Perhaps I look at it a bit biased since In languages with pattern matching that is actually the standard. Different example

g :: (Int b) => b -> b -> b
g a b
g 0 1 = 1
g 1 0 = 2
g 1 1 = 3
g _ _ = 0


above heaviside function:

f :: (Ord a, Num a, Int b) => a -> b
f a
   | a> 0 = 1
   | otherwise 0


Since this is a C/C++ thread, lets try it in it. I suppose the following is bad form then:

int heavyside (double a)
{
    if ( a > 0 )
    {
        return 1;
    }
    return 0;
}



int heavyside_prime (double a)
{
    int b; // init var without asg. value, bad?
    if ( a > 0 )
    {
        b = 1;
    }
    else
    {
        b = 0;
    }
    return b;
}


The break statement is nothing but a goto anyway. Regard this F77 loop over a matrix, doing something pointless:

        b=0
        DO 10 j=1, n
        DO 10 i=1, n
          b = b + A(i,j)
          IF b.LT.0 GO TO 10
10    CONTINUE




Follow up on my C++ example. It is indeed bad what I did. Confer:
http://stackoverflow.com/questions/1597405/what-happens-to-a-declared-uninitialized-variable-in-c-does-it-have-a-value


int heavyside_twoprime (double a)
{
    int b = 0; // now with determined value!
    if ( a > 0 )
    {
        b = 1;
    } // but not as clear since there's no more else.
    return b;
}


edit: added missing `Ord a` to type declaration in a Haskell example

DrSuperGood

Quote from: sdog on December 18, 2016, 10:58:39 PM
Since this is a C/C++ thread, lets try it in it. I suppose the following is bad form then:

int heavyside (double a)
{
    if ( a > 0 )
    {
        return 1;
    }
    return 0;
}



int heavyside_prime (double a)
{
    int b; // init var without asg. value, bad?
    if ( a > 0 )
    {
        b = 1;
    }
    else
    {
        b = 0;
    }
    return b;
}


Except one can write it like...

int heavyside (double a)
{
    return a > 0 ? 1 : 0;
}

Which is 1 return, 1 line, functionally the same (as far as I am aware) and still very readable.

Quote
Follow up on my C++ example. It is indeed bad what I did. Confer:
You initialized the variable before use which is completely fine to do, even if it was not initialized at declaration. The compiler will automatically produce an initial value setup or branch code in such a way that the variable always has the value desired.

The only thing one has to be careful with is that a value is assigned before the variable is used. Simutrans suffered from such a problem a while ago resulting in MSVC and GCC builds being incompatible due to different or no assignment with an uninitialized field used to calculate pakset hashes.

In Java one can even initialize final (Java's equivalent to const) variables with such conditional statements. As long as the variable is assigned a value in all cases it is fine. This does not apply to C++ though where a const can only be assigned with a declaration.

I personally have no problems with multiple returns. My general aim when programming is toward mostly flat, linear code where any given line is within as few flow control statements as possible and the function executes downwards broken apart into separate related steps. Often one can be tempted to place more and more code inside conditional statements. Doing so results in code that I personally find hard to understand what is happening and is difficult to read due to the excessively deep tabbing required. As such if I have an early exit clause (eg nothing to do detected at start) then I test for that and return, thus keeping the rest of the function code flat and linear instead of having to place it in an almost function long else block which itself might branch further into other blocks. It just makes so much sense as one can read from start of function and see immediately "the function exits due to nothing to do" instead of having to scroll all the way down to a return statement and try to see where that value came from.

Ters

Quote from: sdog on December 18, 2016, 10:58:39 PM
2000ish is already something pretty recent.

That is not clear, because quite radical changes took place between 2000 and 2010. At the beginning of the decade, most people only had computers with a single processor with a single core. Vector instructions were however probably already common. At the end, multi-core CPUs were common, plus one, or even two, GPUs. The idea of using GPUs for thing other than graphics had also taken hold. There were also the mixed-paradigm Cell processors, containing a traditional core plus several "GPU-style" cores. But I guess the people using Fortran probably already had workstations with multiple CPUs and/or clusters.

Quote from: sdog on December 18, 2016, 10:58:39 PM
Are multiple return statements really that bad?

This (py) code:

def heavyside(a):
   if a > 0:
     return 1
   else:
     return 0

looks clearer to me than that one:

def heavyside'(a):
   if a > 0:
      b = 1
   else:
      b = 0
   return b


I've been asking myself the same question. The code is question is however more complicated than this, with multiple nested loops and/or if-blocks, which might be the actual problem.

Vladki

Quote from: DrSuperGood on December 19, 2016, 01:11:53 AM
Except one can write it like...

int heavyside (double a)
{
    return a > 0 ? 1 : 0;
}

Which is 1 return, 1 line, functionally the same (as far as I am aware) and still very readable.

And you can simpify it further to:

int heavyside (double a)
{
    return (a > 0);
}


DrSuperGood

Quote
Vector instructions were however probably already common.
They were very basic compared with all the vector instructions released post 2000. Modern vector instruction extensions released within the last year or so are capable of manipulating 16 different 32 bit numbers (eg floats) at the same time in a 512 bit long register. They contain multiple such 512 bit registers as well. Of course at the end of the day the bottleneck becomes how many separate functional units exist to do the required calculations (a vector need not be implemented in a way that all calculations are run in parallel at the same time, it might be limited by the number of free appropriate functional units) however I am pretty sure modern processors have a lot more of those as well.

sdog

Quote from: Vladki on December 19, 2016, 11:18:19 AM
And you can simpify it further to:

int heavyside (double a)
{
    return (a > 0);
}



Ouch! That hurts! It seems old C didn't have boolean types and abused integers for that. Thanks Vladki, I expect I would not have learned that for a while had you not mentioned it. Here's some experimenting with it:

[cling]$ (0<1)
(bool) true

[cling]$ 5 - (0<1)
(int) 4

[cling]$ 5 - (0>1)
(int) 5


Quote from: DrSuperGood
Except one can write it like...

int heavyside (double a)
{
    return a > 0 ? 1 : 0;
}

Which is 1 return, 1 line, functionally the same (as far as I am aware) and still very readable.
I never managed to remember that syntax, used it often in gnuplot. It looks almost perlish, is it discouraged to use it today?

I see the trouble with using such a minimal example is that it is so easy to find much simpler solutions. I ought have stayed with a proper definition of heavyside, and make it real :-)

By the way, to define this Heavyside to work independent of the type of input, as long as it is numeric and comparable, I would have to overload it? Speaking of typing, I botched the type declaration in the Heaviside Haskell example: the input must be of a type that can be ordered (corrected now).

@Ters

I meant using return it in the ways DrSuperGood mentioned, for example, to check for trivial cases at the outset, and return right away. Rather than having nested ifs. But also for simple things returns seem to be good practice is some languages. A function that is built similar to the pattern matched examples, ie a number of checks for conditions and then returning right away is typically considered bad form in C/C++?

Quote
That is not clear, because quite radical changes took place between 2000 and 2010. At the beginning of the decade, most people only had computers with a single processor with a single core. Vector instructions were however probably already common. At the end, multi-core CPUs were common, plus one, or even two, GPUs.
Well, CUDA is really quite different and new. I suppose that might have changed things for some problems. It takes quite different algorithms. From the type of problems I know, there's not much gained from palatalisation of solving DEs, usually one has a large number of parallel threads each solving an ODE. That can be achieved for example by subsituting or modelling multi-variable DE as coupled single variable ODEs. When this is done, there's not much difference to 2000, if you run your stuff on 128 CPUs with one thread and core each or on 16 octo-core cpus doesn't change your approach to parallelisation very much. True, communication between cores is faster, but its still massively punished.

You may also consider that an algorithm that ran on 1975 hardware might be not that unsuitable to adopt to run on the very limited resources of a single CUDA thread. If the problem is embarrassingly parallel It might be tempting to make these solvers even simpler, drop all support for stiff problems, and just give up and pass it on to a proper CPU, with more sophisticated algorithms. However, that is just a wild ad-hoc speculation.

Quote
Vector instructions were however probably already common.

Yes, they were a very great incentive to use Intel Fortran compiler over gcc gfortran. I cannot recall if SSE was available in gfortran when I started in 2005. There was on way to get ifort anyway back then. Anyhow, that's job of the compiler, it wouldn't change the way the code was written in any way.

There is one more aspect to it, one can write vector and matrix operations per element or use built in functions per whole matrix in new fortran standards. New fortran standards were not supported by gfortran for quite a while. The compiler was also not smart enough to identify something as a simple vector operation that could benefit from vector instructions.

Ters

Quote from: sdog on December 19, 2016, 05:14:47 PM
Ouch! That hurts! It seems old C didn't have boolean types and abused integers for that.

It didn't abuse integers for that. They were integers already when C came around. CPUs in general have integer and floating point registers, and maybe also special address registers. Some had BCD-registers in the old days, and now we have vector registers. But I have never heard of boolean registers, beyond status registers, which aren't available in the same way. Not that I'm familiar with a great deal of CPU architectures, and only x86 is fresh in my mind.

And booleans are still just integers in the newest C standard from what I can gather. The "new" native bool type is just another integer type just like char, short, int and long.

Vladki

AFAIK most languages will happily convert boolean to int as false=0 and true=1 and vice versa. Only exception I know is bash (and other unix shells), where true=0 ;) 

Also that example function heavyside(), would make more sense returning bool instead of int.

DrSuperGood

The bool type is there to make clear that the API will output either true (not 0) or false (0). The problem with int as a logic type is that it implies that any numeric value in range of the type could be output and might need to be dealt with (not clear that it is for logical use).

For example one could change the previous code to the following in which case the output clearly is not a boolean.

int heavyside (double a)
{
    return a > 0 ? 1 : -1;
}

As such I would disagree with the following solution...

int heavyside (double a)
{
    return (a > 0);
}

Unless the function declaration was changed to make it clear that the output is logic and not a number.

bool heavyside (double a)
{
    return a > 0;
}

Although C/C++ do probably strictly define the values of logic operation results now, in theory the logical true produced need not be defined as 1, it could be -1 or any non-zero value. This is because logic instructions generally assume that 0 is false and anything not 0 is true and as such any non 0 value could be used to represent true.

Simutrans abuses this a lot to test for NULL pointers. Instead of checking pointer != NULL it just tests for pointer since NULL is defined as 0 in the case of Simutrans so anything not NULL is logical true. Personally I am against this since a pointer is not a logical value, despite being able to be used as one.

Ters

Quote from: DrSuperGood on December 19, 2016, 08:03:29 PM
Simutrans abuses this a lot to test for NULL pointers. Instead of checking pointer != NULL it just tests for pointer since NULL is defined as 0 in the case of Simutrans so anything not NULL is logical true. Personally I am against this since a pointer is not a logical value, despite being able to be used as one.

Preferring to write as little code as possible, but with meaningful names, the ability to test the validity of a reference by just writing the name of the reference is something I actually miss in Java. (A non-null pointer in C/C++ isn't necessarily valid, but that is a different problem.) Then again, NULL references has been declared (one of) the biggest mistake in software engineering, by the very man who invented them. I don't think I have ever tested that an actual number is 0 by casting it to a boolean, just pointers. And perhaps some C++ classes that have a bool conversion operator for whatever reason.

sdog

It pays not to use an interactive shell only when learning a language:


Here's a trivial example that fails horribly when compiled without specifiying any flags to g++.

#include <iostream>

int main()
{
    uint a;
    a = 1;
    while (int i=0; i<=2; i++)
    {
        a = a-i;
        std::cout << a << std::endl;
    }

    return 0;
}

// vim: et ts=4 sw=4 sts=4


Now, unsuprisingly that oughtn't work, and it does not:

➜ make       
g++     test.cc   -o test
➜ ./test
1
0
4294967294

I've to find the correct compiler flags (trap) to throw an exception when this happens. That ought to apply equally to integer overruns, or are these automatically NaN, like with floating types such as double?

That is the most severe case of while having with C++ something that looks rather high level, it is deep down still C. As such more concerned with machine comprehension than human comprehension. (For fun I've tried to google "mathematical definition of C data types" but didn't find anything. It would be quite interesting to see how uint is defined, since it is very clearly not ℕ0). As a learner I keep from this: C++ really can be a minefield, and great care must be taken to avoid unvorgiveable mistakes.

There is one more aspect, it's the first time in a long while that I use a for loop rather than a while loop. I've found the for rather superfluous, however, in C/C++ this is different. Apparently the while only accepts either a declaration or a condition in its argument. While the for allows to make loops with an variable that is to be iterated.

Quote
Quote
Simutrans abuses this a lot to test for NULL pointers. Instead of checking pointer != NULL it just tests for pointer since NULL is defined as 0 in the case of Simutrans so anything not NULL is logical true. Personally I am against this since a pointer is not a logical value, despite being able to be used as one.

Preferring to write as little code as possible, but with meaningful names, the ability to test the validity of a reference by just writing the name of the reference is something I actually miss in Java.

That seems quite hackish. Could this have been avoided by some syntactic sugar, like, overloading the 'if' builtin function to test existence?

@Ters
Quote
It didn't abuse integers for that. They were integers already when C came around. CPUs in general have integer and floating point registers, and maybe also special address registers.
Yet we rely on higher abstraction than just manipulating registers. In essence all data types in any language may be reduced like that. Since integer registers are different in different machine architectures, isn't int in itself already as much an abstraction as boolean?

In one case we don't want type mismatches, and we are quick at criticising obsolete coding practices that employed those on purpose (eg in all caps languages like BASIC or FORTRAN). On the other hand a type mismatch from so extremely different types like boolean and integer may be acceptable when convenient? Perhaps that's just my inexperience, but that seems somewhat hackish to me.


ps.: I thought it would be a spiffy idea to write a simple parser that reads scope from syntactic whitespace indents and inserts `{...}` and `;` pairs as needed. Then pipe the source first to the parser and then to the compiler. Voila, C to be nearly as readable and writable as python.

prissi

C happily allows integer under and overflows and will not throw an exception when bits are lost. You may find flags that warn you against it, though.

C was never made for such, and testing for overlow is very time consuming. Maybe you can find some of that in floating points who go to +- inf resp. to zero. Bit throwing bits way but some arithmetic is extremely common when driving hardware and thus one of the areas where C(++) is used most (at least in the beginning).

DrSuperGood

Quote
Now, unsuprisingly that oughtn't work, and it does not:
What part of it does not work? It looks fine to me at a glance.

1 - 0 = 1
1 - 1 = 0
0 - 2 = 4,294,967,294

My manual trace gets the same results as was output....

Quote
I've to find the correct compiler flags (trap) to throw an exception when this happens. That ought to apply equally to integer overruns, or are these automatically NaN, like with floating types such as double?
The entire way singed types work is by under/overflow. In fact unsigned arithmetic and signed arithmetic are the same... This is due to two's compliment. Multiplication and comparisons on the other hand are a completely different story.

Quote
There is one more aspect, it's the first time in a long while that I use a for loop rather than a while loop. I've found the for rather superfluous, however, in C/C++ this is different. Apparently the while only accepts either a declaration or a condition in its argument. While the for allows to make loops with an variable that is to be iterated.
For loop is nothing more than a neater way of showing a while loop.


// for loop
for (int i = 0 ; i < x ; i+= 1) {
    something(i);
}

// while loop
int i = 0;
while (i < x) {
    something(i);
    i+= 1;
}

Both are functionally the same, just one does all the syntax in a single line. I recommend using a for loop when iterating something sequentially. I recommend using a while loop when testing for something that is not sequential. Exception is Java where for loops have a special hard coded mechanic for types that extend Iterable, but the syntax of the for loop is then different and often such loops are not that useful due to a lack of counting (only have access to an element, not what number the element may be).

QuoteThat seems quite hackish. Could this have been avoided by some syntactic sugar, like, overloading the 'if' builtin function to test existence?
As far as I am aware "if" is not a function but rather a programming primitive. As such I do not see how it could be overloaded, but I do admit I am still novice to C++.

Sure testing the pointer is shortest code wise, however a pointer is not really a boolean logical value but rather a memory address. It firstly couples to NULL being address 0 (which is mostly standard now, but not necessarily the case). Secondly it is using the fact that logical 0 is 0 while logical 1 is not 0 which is not really pointer logic, as in theory a pointer to 0 could exist and could be meaningful (I do not know of any real examples where it is). NULL is a special pointer value specifically designed to represent an invalid pointer, hence comparing for it does make sense.

Quote
Yet we rely on higher abstraction than just manipulating registers. In essence all data types in any language may be reduced like that. Since integer registers are different in different machine architectures, isn't int in itself already as much an abstraction as boolean?
Registers have little to do with it. Registers are more a part of what resources you have available or used to compute with. Memory is slow compared with registers which is why modern instruction set extensions keep adding them. In fact the newest ones are a massive 512 bit long and can compute 16 different 32 bit values (floats or ints) in a single instruction call.

Quote
In one case we don't want type mismatches, and we are quick at criticising obsolete coding practices that employed those on purpose (eg in all caps languages like BASIC or FORTRAN). On the other hand a type mismatch from so extremely different types like boolean and integer may be acceptable when convenient? Perhaps that's just my inexperience, but that seems somewhat hackish to me.
It is hackish and violates type safety. Hence why Java only allows bool values in its tests. C and C++ are still low level enough the get away with the legacy assembly level behaviour of logical 0 being 0 and logical 1 being non 0. Almost every instruction set I know follows that low level behaviour, however it is not type safe as a pointer is not a logical value although its meaning and test might work in such a situation.

Joke is compilers output the same code. Test if a pointer is null and you will get the same code out as simply testing the pointer as if it is a logical value.

Quote
C happily allows integer under and overflows and will not throw an exception when bits are lost. You may find flags that warn you against it, though.
Using such flags in C/C++ directly seems to be a minor problem and a huge source of errors. I wonder why they did not add primitive language features to do this...

sdog

It would have been a smart idea then not to permit subtraction for uint.
Oh, nevermind, there appears not to be an error message when adding int and uint. (I thought C were strongly typed!) It seems to implicitly cast int on uint(!) edit On some machines it only needs to be ignorant of what type of integer it is, since with setting the sign bit interpretation 0 might denote the presence of a negative value for a given digit.


[cling]$ const int a = -10
(const int) -10
[cling]$ const uint b = 1
(const unsigned int) 1
[cling]$ a + b
(unsigned int) 4294967287


I think I could draw two consequences: (a) avoid uint by all means or (b) check before or after each arithmetic operation if negative values are present or might occur.

(b) would be feasible, if only subtraction were dangerous.
(a) seems to be problematic as well, as uint are apparently frequently used. There might be a reason. Even if not, it might still be found in code. (there might not be much use for uint, outside of peripherial cases like streaming IO, with 64 bit registers the gain of 1 bit is marginal.)

Quote
C was never made for such, and testing for overlow is very time consuming.
Trapping 'NaN' with gcc in Fortran was already noticeably slower. I see the difficulty here.


Cheers for this discussion, while it is more than scary, it is good to know about such trapdoors.



DrSuperGood:
Quote
As far as I am aware "if" is not a function but rather a programming primitive. As such I do not see how it could be overloaded, but I do admit I am still novice to C++.
I've heard primitive, and thought it only an idiosyncratic expression for built in function. That might be quite interesting to learn what primitive actually means in C.


Quote
Joke is compilers output the same code. Test if a pointer is null and you will get the same code out as simply testing the pointer as if it is a logical value.
That's no joke at all. Being one layer of abstraction closer to the machine means one has to realise the lacking abstraction oneself. This is more tedious and dangerous, hence the popularity of high level languages. In my novice opinion that something does 'work' the same way doesn't free one from doing a proper abstraction, ie abstracting to something that can be expressed in mathematical formalism and following good practices such as type safety.

Ters

Integer wrap-around is such a useful thing that even languages that otherwise add lots of other expensive checks (like array bounds checking), do not add checks for integer overflow. In fact, I'm not sure I have ever actually used a programming language that does, but then there are quite a lot of languages I have never explored this aspect of. Division by zero will however crash, since such a check is typically built into the hardware. It is at least on x86, but x86 is supposedly one of the weirder popular architectures. I don't think you will get any C++ exception for this in C++, because C++ exception are a different concept from hardware exceptions (Windows might have some way of treating them similarly, though).

Floating point will however give you overflow and underflow errors, because wrap-around there is not useful for anything, and floating point is more complex anyway (which is why Simutrans doesn't use it). I don't think you can actually reach floating point infinity through addition. To do that, you need to divide by zero. This means that while integers invariable fail for division-by-zero and let overflow pass, for floating point it is the other way around, except that you can configure whether division-by-zero should fail (again, x86, or rather x87).

uint is strictly speaking not a more dangerous data type than any other. Since what you get internally when subtracting 1 from 0 is exactly the same, you will run into problems either way, maybe even exactly the same problem depending on what you do next (such as using the value as an array index). You should check the inputs before doing the operation. How long before depends on how predictable things are and how paranoid you can afford to be.

DrSuperGood

Signed and unsigned addition and subtraction are the same. The same instructions are used and the same result is produced. This is because of two's compliment mechanics.

Eg 0 - 1 requires underflow to work.

// unsigned example.
uint8_t zero = 0; // 0x00
uint8_t one = 1; // 0x01
uint8_t out = zero - one; // 0xFF = 255
// underflow occurred!

// signed example.
int8_t zero = 0; // 0x00
int8_t one = 1; // 0x01
int8_t out = zero - one; // 0xFF = -1
//underflow occurred! Or did it? If using a signed instruction this might not want to be reported as underflow

Now division and multiplication are an entirely other story. How they are implemented depends on the platform. Some platforms offer different instructions for signed and unsigned. Other platforms, usually micro processors of sorts, require one to emulate a signed multiplication using some tests and unsigned multiply.

Most architectures do set flags when overflow or underflow occur. These flags can be tested immediately afterwards and appropriate corrective or informative actions can be taken. Problem is that testing such flags is probably more expensive than performing the operation which produces them, as such by default you do not want to test them. For signed addition and subtraction they are pretty meaningless as overflow and underflow occur all the time however I do think some special signed instructions might exist which do set them meaningfully (take into account the magnitude of the numbers).

Languages like C/C++ could offer some form of overflow and underflow detection. A set of special addition, subtraction, multiplication, division, etc functions would be needed which meaningfully test the overflow/underflow bits after performing the operation, and run specified code when such condition is detected.

Floating points are handled slightly differently. They have an interrupt vector to catch certain bad condition and take corrective action or adjust results, similar to how divide has for division by 0. Many of the features of floating point units are abstracted away from some languages such as Java.

Ters

Quote from: DrSuperGood on December 22, 2016, 05:00:34 PM
Most architectures do set flags when overflow or underflow occur. These flags can be tested immediately afterwards and appropriate corrective or informative actions can be taken. Problem is that testing such flags is probably more expensive than performing the operation which produces them, as such by default you do not want to test them. For signed addition and subtraction they are pretty meaningless as overflow and underflow occur all the time however I do think some special signed instructions might exist which do set them meaningfully (take into account the magnitude of the numbers).

x86 has two overflow flags (the term underflow is not used for integer operations), one of which indicates signed wrap-around (for signed char, between -128 and 127, either way) and one which indicates unsigned wrap-around (for unsigned char, between 0 and 255, also either way). The comparison operators (such as > and <) are implemented as a subtraction operation (which discards the result) followed by an instruction checking one of these flags (plus some other flags depending on which exact operator).

prissi

While most architectures have flags, it is very hard to see whether a signed or unsigned wrap around is needed, especially when number constants like€ '3' are involed which could be either signed or usinged.

On type sizes, my favorite was a TI compiler for a DSP which had size of char = foat = int = 40 bit, and sizeof(char[4])==sizeof(char) ...

ADA had unitary numbers and overflow checks. (That was the reason why some sattelite was lost some years ago, the communication failed between the ADA rocket controller code and the newer C code, because never had negative number offsets been used in their tests ...)

sdog

Oh dear, int8_t a = 127; a + 1; a +1; does a wrap around to negative as well.

With testing and trapping so dangerous, the only safe way seems to be to avoid any integer type for general calculations, and resort to libraries providing, for example, arbitrary precision integers, or other forms of safe integers. On the other hand, this brings the huge drawback of introducing dependencies, beyond standard library. Which in turn limits the modularity of functions written that way.

That is exacerbated by machine dependent definitions of int:

#include <stdint.h>
[cling]$ sizeof(int)
(unsigned long) 4
[cling]$ sizeof(int64_t)
(unsigned long) 8
[cling]$ sizeof(int32_t)
(unsigned long) 4

Cling is clang, for gcc int is also a 4 byte type.
But it doesn't really matter what actual compilers do, the C standard seems to define (i've not checked it) that int[/i] has at least the size of short int.

[cling]$ sizeof(short int)
(unsigned long) 2

That way one has to assume that int and uint might be only 16 bit integers.

[cling]$ #include <cmath>
[cling]$ std::pow(2,(2*8-1))-1
(double) 32767.0


concluding musing: Overall, quite interesting. C++ learing is indeed quite different from other languages, and seems pointless without also learning more about the machine. I wonder how experienced one has to be to actually write productive code, ie, without risking to cause a bug.

By the way, why #include <stdint.h>? The std seems to stipulate it would be a part of standard library, but the way it is included doesn't. A brief search based on the assumption that <stdint.h> is obsolete didn't get me any further either.



Quote from: DrSuperGod
For loop is nothing more than a neater way of showing a while loop.

[...]
// while loop
int i = 0;
while (i < x) {
    something(i);
    i+= 1;
}


I suspected as much, but thought that both, for and while, might be syntactic sugar for something more fundamental.

I see two problems with this loop construction. The first one is that the counter int i is not restricted in scope to the actual loop. That can be dangerous as the next loop might start at an advanced point, unless re-initialised. The other is that the iteration is an explicit statement that could easily be overlooked or forgotten. The question, is the while loop deprecated for such use, and ought only be used when looping through a construct that does not need a counter variable?

However, the first loop example in the textbook I'm using, does use a while loop exactly for that, with a global variable. (And even a very unsafe termination condition, (i != n), rather than (i < n). (second thought, with integer wrap-around, that doesn't seem to be safe any more.)

ps.: nice, there is a much more sensible looking iteration i += 1; in this.



@prissi
It really sounded like inviting mistakes to happen. At least at compile time, the compiler could check if uint and int are in the same arithmetic calculations and exit with an error. I wonder why such is not part of the standard. I mean, when it is needed, one can always cast from int to uint, then be aware of the danger.

Ters

Quote from: sdog on December 22, 2016, 09:07:59 PM
By the way, why #include <stdint.h>? The std seems to stipulate it would be a part of standard library, but the way it is included doesn't. A brief search based on the assumption that <stdint.h> is obsolete didn't get me any further either.

#include <stdint.h> is perfectly normal way of including stuff in C. C++ on the other hand likes to take the standard headers inherited from C, wrap them in another header without the extension, but with a C prefix, and which also puts the stuff inside in the std namespace. However, since C++ is (almost) a superset of C, (almost) everything that is valid C must also be valid C++, including including stuff just like in C. stdint.h is not obsolete as far as I can tell, but C++ prefers that you refer to it as cstdint.

Quote from: sdog on December 22, 2016, 09:07:59 PM
With testing and trapping so dangerous, the only safe way seems to be to avoid any integer type for general calculations, and resort to libraries providing, for example, arbitrary precision integers, or other forms of safe integers. On the other hand, this brings the huge drawback of introducing dependencies, beyond standard library. Which in turn limits the modularity of functions written that way.

I think the general idea when using the C language is that if you unintentionally overflow a datatype, you were screwed long before then. Trying to handle it there and then at runtime is too late anyway.

With C++, you can at least create classes that thanks to operator overloading, behave just like built-in data types. And this idea extends itself to making the data type include the unit (seconds, meters, kilograms, Newtons, etc.). This can be done by either having a data type for each unit, or by having an extra field next to the value which somehow describe the unit. Either way, you can avoid nasty bugs where units are mixed, such as adding a m/s value with a m/sw value. It can also avoid mixing SI and Imperial units, but I think that problem is more likely to occur in the communication between components that are made by different teams using different tools, including potentially programming language and almost certainly supporting libraries, meaning this kind of metadata gets lost.

But the more important thing to have in mind is using the right tool for the job. If arbitrary precision arithmetic and an extremely low fault tolerance is critical, C is not the right tool. C++ can be somewhat better, mostly due to operator overloading. There is a reason we have kept on inventing programming languages after C: It is not perfect for everything. It was after all made for writing UNIX. When writing an operating system, integer overflow is rather irrelevant. Apart from 0, the value you don't want to pass is most likely not related to the size of the data type. Nor do they care for arbitrary precision.

However, being the language of the mother of all major modern operating systems, it is also a language, perhaps the only language beyond vendor specific assembly code, that is almost guaranteed to exist on every platform, which probably has contributed a lot to its popularity.

DrSuperGood

Quote
On type sizes, my favorite was a TI compiler for a DSP which had size of char = foat = int = 40 bit, and sizeof(char[4])==sizeof(char) ...
Either the minimum word size was 40bit and packing was used (so all types were at least 40 bit in size), or sizeof was bugged (which I have had in a compiler for an 8 bit pic processor)...

Apparently systems do exist where a byte is not defined as an octet. Honestly I have no idea how one would even go about writing portable code for such systems... As such I just ignore their existence when programming as I doubt its worth the time or effort caring.

Ters

Quote from: DrSuperGood on December 23, 2016, 01:37:47 AM
Urg good point that I completely overlooked... I guess there are separate signed addition/subtraction instructions just one can emulate them efficiently with unsigned addition/subtraction as long as one is careful (know for sure what values are positive or negative) or occasionally tests the MSB (set for negative numbers).

Huh?

Combuijs

It is of course dependent on personal or company programming style, but I use a for loop when I expect to run the loop for all the elements in its domain and a while loop when I expect this is not the case.

For example, say k is an array of n integers then I would use a for loop for getting the sum of all integers:


int sum = 0;
for (int i = 0; i < n; i++)
{
   sum = sum + k[i];
}


On the other hand, looking for a value m in that array I would do


bool found = FALSE;
int i = 0;
while (i < n && !found)
{
  if (k[i] == m)
  {
      found = TRUE;
   }
   else
   {
      i++;
   }
}


Note that after the while loop you still have access to the i variable to use for example the index.

But as said, that is all personal style, you can do the first one in a while loop and the second one in a for loop.

As for 0, 1, TRUE and FALSE, I can never remember which one is which, so I am always in doubt when I see


if (pointer)
{
}


I really would prefer


if (pointer == null)
{
}


or


if (pointer != null)
{
}


for readability.

In C# I am very happy I can do things like


if (found == false)
{
}


instead of


if (!found)
{
}


as you tend to overlook the ! character when the expression is a bit larger.

And I know that


return a > 0 ? 1 : 0;


is much more concise than


if  (a > 0)
{
   return 1;
}
return 0;


but I find the latter more readable as I always forget which one comes first in the ? operation. Yes, I know, first true then false, but 0 (false) < 1 (true), which always gets me confused.
Bob Marley: No woman, no cry

Programmer: No user, no bugs



sdog

@prissi, DSG
Quote
QuoteOn type sizes, my favorite was a TI compiler for a DSP which had size of char = foat = int = 40 bit, and sizeof(char[4])==sizeof(char) ...
Either the minimum word size was 40bit and packing was used (so all types were at least 40 bit in size), or sizeof was bugged (which I have had in a compiler for an 8 bit pic processor)...
There used to be an old 40 bit improved precision float, with 32 bit mantissa, 7 bit exponent. I remember seeing references or traces of it in some older numerical library routines. One cannot do much with single precision. Machine epsilon = 2**(1-p), where p are the bits of the mantissa. For single p=28 and thus ε_32 = 2**-24 = 1e-7,
ε_40 = 2**-32 = 5e-10, whereas double has a 10 bit mantissa leaving ε_64 = 2**-53 = 2e-16.

Combining regular 32 bit and 8 bit signed integer registers seems not that absurd, most operations can be done on exponent and mantissa independently. However, did they even have 32 bit wide registers in the sixties or seventies when this was done? I suppose ALUs could be linked arbitrarily up to the desired width.


Quote from: Combuijs on December 23, 2016, 10:21:19 AM
It is of course dependent on personal or company programming style, ...

[...]


return a > 0 ? 1 : 0;



Thank you for the coding style examples. I find them much more readable and clear indeed.

For me the ternary operator comes close to `tar`. I've used it many dozens of times, and always had to look up the syntax.

prissi

The int size is the size of the natural int of a CPU. Hence on the C64 C compiler (at least one of them) sizeof(int)=1

And sorry, the TI processor DSP had indeed siezof(long)=sizeof(int)=sizeof(double)=4 and sizeof(float)=sizeof(char)=2 but with the byte size of 10 bit. As long as you use the prefined portab.h constants MAXINT you are safe with any architecture (ok, some magic with shifts would not work as expected). Un the other hand that DSP had 40 kB (in units of 10 bit for a Byte) RAM ad was mainly geared towards 32/8 floating points. Since it was a DSP, it came obviously without any stdlib.h support apart from printf to the debug serial port ...

Also C compiler had a very long time their own interpretations of the standard. Until MSVC6 the following code was illegal

for(int i=1; i<10;i++) { do something }
for(int i=1; i<100;i++) { do another thing }

because it expanded it internally to that while construct

int i=1;
while( i<10 ) { something; i++; }
int i=1;
while( i<100 ) { another thing; i++; }

but twice declarations of i are not allowed in the same block ...

Ters

Quote from: prissi on December 23, 2016, 10:40:17 PM
The int size is the size of the natural int of a CPU.

Except for x86-64. When x86 went from 16-bit to 32-bit, int grew from 16-bit to 32-bit, although the definition of word stayed at 16-bit. When 64-bit x86 came along, int stayed 32-bit. Whether any data types changed size at all, depended on the OS if I remember correctly.

sdog

Some good reading I stumbled upon:

On the question: When to pass by value and when to pass by reference?

value semantics vs reference semantics:Some good reading I stumbled upon:

On the question: When to pass by value and when to pass by reference?

value semantics vs reference semantics:
https://akrzemi1.wordpress.com/2012/02/03/value-semantics/

when to pass values, for speed
http://web.archive.org/web/20140116234633/http://cpp-next.com/archive/2009/08/want-speed-pass-by-value/

Conclusion from a brief overview of advice on typical sites: even experienced developers seem to be uncertain about the best practices. With a tendency of, when in doubt pass references.

Related to this question is why often it is not left to the compiler to decide if a data structure has to be copied or referenced, which the compilers are in principle able to do. It seems it is implicated that these were not capable of doing so, but then, wouldn't it be better practice to structure ones functions in a way that enables the compiler to do its job?

Another related question: why in code I've read are there so many passes by mutable rather than constant references, even when it seems not to be necessary. pointless without examples

The text book and many tutorials I'm reading are from the distant past, ie before 11, and also seem to take parallel computing not very serious. It seems that a rule of thumb when in doubt pass by value, as if it goes wrong it only costs performance and memory, but is safer might be more advisable these days.

Q: What are the white-space rules for reference ampersands, let a be any integer, int& b = a; or int &c = a;? No reading here, this didn't google well.


https://akrzemi1.wordpress.com/2012/02/03/value-semantics/

when to pass values, for speed
http://web.archive.org/web/20140116234633/http://cpp-next.com/archive/2009/08/want-speed-pass-by-value/

Conclusion from a brief overview of advice on typical sites: even experienced developers seem to be uncertain about the best practices. With a tendency of, when in doubt pass references.

Related to this question is why often it is not left to the compiler to decide if a data structure has to be copied or referenced, which the compilers are in principle able to do. It seems it is implicated that these were not capable of doing so, but then, wouldn't it be better practice to structure ones functions in a way that enables the compiler to do its job?

Another related question: why in code I've read are there so many passes by mutable rather than constant references, even when it seems not to be necessary. pointless without examples

The text book and many tutorials I'm reading are from the distant past, ie before 11, and also seem to take parallel computing not very serious. It seems that a rule of thumb when in doubt pass by value, as if it goes wrong it only costs performance and memory, but is safer might be more advisable these days.

prissi

Quote(call by value/reference) Related to this question is why often it is not left to the compiler to decide if a data structure has to be copied or referenced, which the compilers are in principle able to do. It seems it is implicated that these were not capable of doing so, but then, wouldn't it be better practice to structure ones functions in a way that enables the compiler to do its job?

I am not sure a compiler is capable of doing this. Assuming the structure is then passed to a library. Should the change be reflected then on caller? I would have difficulties to guess an answer then, but I am only human ...

Ters

Quote from: sdog on December 30, 2016, 07:10:55 PM
Related to this question is why often it is not left to the compiler to decide if a data structure has to be copied or referenced, which the compilers are in principle able to do. It seems it is implicated that these were not capable of doing so, but then, wouldn't it be better practice to structure ones functions in a way that enables the compiler to do its job?

In C, there is no such concept as pass-by-reference. Pointers are a data type of their own. Since C is also meant to be a thin layer above machine code, the compiler isn't really supposed to do things behind the programmers back. It would also be rather disastrous if what the programmer thought was a local copy was the same instance as in the caller, and ended up corrupting it.

C++ has true references, but as far as I know, it doesn't automatically turn things into pass-by-reference, it turns pass-by-const-reference into pass-by-value when more suitable. Generally, this is when the parameter is a const reference to a primitive. Why would one create parameters that are const references to primitives when that is obviously inefficient? Because it might be a templated function that doesn't know what type the parameter is when it was written, so the parameter is written as a const reference to avoid expensive copying if the type turns out to be a big struct. The compiler will optimize this to a pass-by-value when that is cheaper.

When the optimizer is enabled, stuff like inlining can eliminate pass-by-value completely, since nothing is passed at all. I might even do crazy stuff like turn a pass-by-value into pass-by-const-reference for all I know, but it can only do so if you do not modify the value.

And any such tinkering requires that the caller and callee are in the same translation unit. Otherwise, you might end up with the caller passing by value and the callee thinking it was passed by reference, or vice versa. Remember, it might not be the same compiler (at least for C). They might have been compiled at different times by different people.

sdog

QuoteWhen the optimizer is enabled, stuff like inlining can eliminate pass-by-value completely, since nothing is passed at all. I might even do crazy stuff like turn a pass-by-value into pass-by-const-reference for all I know, but it can only do so if you do not modify the value

That is what I referred to. Inlining is done already at -o1 for GCC. That in conjunction with various copy propagation methods ought to do the job.

The other optimisation is copy elision, which is on by default in GCC.

Isn't the const& just an instruction to the compiler that the value isn't changed in this scope, ie it doesn't matter if it's a constant reference or any other pointer if the compiler can establish that it is not meddled with? Which it was certain of in the first place to do copy elision. Why then change specifically into a pass-by-const-reference, as you mentioned?

Ters

Quote from: sdog on December 31, 2016, 01:21:44 AM
Isn't the const& just an instruction to the compiler that the value isn't changed in this scope, ie it doesn't matter if it's a constant reference or any other pointer if the compiler can establish that it is not meddled with? Which it was certain of in the first place to do copy elision. Why then change specifically into a pass-by-const-reference, as you mentioned?

I think it is more of an instruction to the compiler to forbid you from changing the value. (const also predates const&, which affects how the latter works.) Of course, that means that the compiler can make some assumptions. However, the programmer can pull the rug out from underneath the compiler by casting away the const-ness and changing the value nonetheless. Even without casting away the const-ness, the value can change, so I'm not sure how many assumptions the compiler can make.

int a = 1;
const int &b = a;
a = 2;
// b has now changed value

However, values changing outside the "regular" program flow is what the volatile keyword is for. There are also some aliasing rules, which I know little about, having only seen the compiler complain about them being violated a few times, and I think that has been when compiling code I have nothing to do with.

I do not know of any cases where pass-by-value is turned into pass-by-const-reference, I just can't rule it out. I do however know of one case that could be considered pass-by-value being turned into a pass-by-reference (note no const), and that is for the return value. If the return value is too big to fit in a register, the caller will allocate memory for it and pass a reference to the callee. So that

big_struct func() {
  big_struct retval;
  ...
  return retval;
}

is implemented more or less identically as

void func(big_struct &retval) {
  ...
}

in machine code. This happens even when not optimizing. (Optimizing may modify it further. Maybe even put it back into a register if the optimizer figures out it big_struct can be turned into a vector value, but that seems unlikely.)

DrSuperGood

The const keyword tells the compiler that the value cannot change. This has technical implications with regard to the compiled code.

Quote
However, the programmer can pull the rug out from underneath the compiler by casting away the const-ness and changing the value nonetheless. Even without casting away the const-ness, the value can change, so I'm not sure how many assumptions the compiler can make.
No the programmer cannot disregard const, at least sanely. Doing so can cause undefined and compiler specific behaviour to occur. In a lot of cases it will even cause a crash.

The reason const cast exists is as a hacky work around for improperly designed APIs when you are absolutely certain that a const value is just a constant view of a dynamic value.

In some compilers non dynamic const values go to ROM storage which does not even share the same address range as the dynamic RAM storage. No instructions exist to modify such constant data (as it is ROM...) and using a const cast will either throw a compile time warning or might result in some arbitrary address inside RAM being corrupted. In virtual memory mapped computer systems, such as most modern operating systems, compile time const values might be placed in read only pages so attempting to modify a value will throw a security exception and crash the application.

When something is marked as const it is not a suggestion that it should not be modified, it is more a suggestion it cannot be modified. Disregarding this suggestion is not recommended.