News:

Simutrans Sites
Know our official sites. Find tools and resources for Simutrans.

Re: While coding C++ how do you usually test your expressions and functions?

Started by sdog, November 25, 2016, 06:24:59 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

sdog

I think some clarification with regard to the const matter would be good. It seems that intended and perceived meaning do not converge yet.

Initially we were talking specifically about constant references. For example let N be a field that includes all valid values for int and y in N, and int a = x; const int &b;. I understand this as a reference where the value of a is mutable but this value can be accessed read-only through b; ie a = y; is possible while b = y; is not, for all y in N\x.
Now, I understand the const attibute in the context of a & reference is to inform the compiler that the latter operation, b += y;, is forbidden. However, in the binary there is no difference if there is a pointer to the address denoted where the value of a is stored, a const reference, a non-const reference or a itself. Is this so far correct?


I should like to discuss this quote in that context, I number sentences for later reference:
Quote
(1) I think it is more of an instruction to the compiler to forbid you from changing the value. (2)(const also predates const&, which affects how the latter works.) (3) However, the programmer can pull the rug out from underneath the compiler by casting away the const-ness and changing the value nonetheless. (4) Even without casting away the const-ness, the value can change, so I'm not sure how many assumptions the compiler can make.
[...]
(5) I do not know of any cases where pass-by-value is turned into pass-by-const-reference, I just can't rule it out.
(1) appears to be consistent with what I think. However, (5) is slightly contradictory, why would the compiler, that creates the machine code bother with instructions to itself? That gets me to the second point of (5) when the compiler can establish that there are%

Ters

For C/C++, I tend to just write the code and test if the entire program works as it should. Sometimes I have written smaller proof-of-concept programs. But I have never written more than perhaps ten lines of C/C++ code professionally. I have written some C# code professionally, but that was just test clients and proof-of-concepts clients for our web services.

When doing Java professionally, we use xUnit-style testing. I have used cppunit for some C++-testing, which I found somewhat more cumbersome than doing the same with Java, but certainly doable, at least if the code isn't too tightly coupled. These kinds of tests are however not just to test that the code you write works, but also that it keeps working, even as the program gets rewritten year after year, known as regression testing. The tests are written as part of the code base, and can be set up to run every time the application is built. Some insist on writing the tests first, then write the implementation until the tests pass. I do that sometimes, but not always, probably not even most of the time.

I also sometimes throw together some small proof-of-concept programs in Java as well. While they must be compiled to run, it is much easier than with C, C++ and C# since Java programs are not linked into a single image when built.

But even in the code I write professionally, there are parts that only get tested by actually using the program.

DrSuperGood

I generally do not test the expressions itself as I have had enough experience that most expressions I write work as intended. I find testing the end behaviour far better, especially since a lot of what I have done so far is maintenance work so changes should be immediate.

Ideally one wants test each module to some extent. However the way Simutrans is written is not very modular with a lot of functions that are hard to test or couple so much that one cannot really test.

prissi

When Simutrans was way smaller there were some tests, of which only the default map is still existing. But I think they were already useless before I took over.

And I wonder how one would test something that acts one an object as complex as the world_t in Simutrans. The chances of the test routine being buggy seems as high as that the code is buggy. Same for a scientific simulation: you can verify that for a single wavefunction the eigenvalues are correct; but if the coupling between them is assumed "correct physics" and implemented correctly, that can be old seen by running the simulation.

Moreover C (like Fortran) predates procedure testing nearly by decades (I think). Those are rather an introduction into practice in the last twenty years.

Anyway, for interpreter there is Ch: https://www.softintegration.com/download/ which is based on this http://www.drdobbs.com/cpp/building-your-own-c-interpreter/184408184 from 1989 ...

DrSuperGood

Quote
And I wonder how one would test something that acts one an object as complex as the world_t in Simutrans.
One cannot as its not very modular. Ideally one would want to test individual parts.

For example a factory. One should be able to simulate a factory working without a need for any visuals or a world. One should be able to simulate the factory receiving goods, simulate the factory ticking and simulate output being pulled from it etc.

Tests for world_t would be placing objects in a well defined way and preforming actions on them to check if they mutated properly I guess.

sdog

Ters:
Quote
For C/C++, I tend to just write the code and test if the entire program works as it should. Sometimes I have written smaller proof-of-concept programs. But I have never written more than perhaps ten lines of C/C++ code professionally.

DrSuperGood:
QuoteI generally do not test the expressions itself as I have had enough experience that most expressions I write work as intended.
Also with the complicated syntax, say anonymous functions in C++11, or things you are not actively used to? (something that to me as an outsider seems as if it were abundant for every given user of c++) I suppose it is rarer, and you might just compile a quick test? It indicates though that the mistake in my assumption is wrong, and most are so sure in the language that they check expressions and functions without having to resort to tests.

Quote
Same for a scientific simulation: you can verify that for a single wavefunction the eigenvalues are correct; but if the coupling between them is assumed "correct physics" and implemented correctly, that can be old seen by running the simulation.
Thanks for that example. That was indeed 60% of my work. Looking at output data and reasoning whether it is correct or not. This has been done for nearly the same code for nearly 20 years by about 5 people on average. However, it helps a lot when one could be certain that single subroutines or smaller expressions are correct. Since many things go wrong when putting the stuff together (passing arrays to functions in f77 -- pain!) it helps just a bit.

Prissi:
Ch looks quite good, cheers for the link.

QuoteMoreover C (like Fortran) predates procedure testing nearly by decades (I think). Those are rather an introduction into practice in the last twenty years.
The Linux kernel alone would require 30t of punch cards...
Stuff was somewhat more terse back then.


I forgot something very useful before, checking the type of the output of expressions. Here's a simple example.



****************** CLING ******************
* Type C++ code and press enter to run it *
*             Type .q to exit             *
*******************************************
[cling]$ #include <cmaths>
input_line_3:1:10: fatal error: 'cmaths'
      file not found
#include <cmaths>
         ^
[cling]$ #include <cmath>
[cling]$ pow(5, 2)
(double) 25.0000
[cling]$ int a = 5
(int) 5
[cling]$ pow(a, 2)
(double) 25.0000
[cling]$ double b = 5
(double) 5.00000
[cling]$ pow(b, 2)
(double) 25.0000
[cling]$ 5.0
(double) 5.00000
[cling]$


Which is also of much less importance with C since function declarations already
define the output type.

In Haskell I often use ghci to get the type of a function, to paste it into my programme
instead of writing it right away. Example:

Prelude> let f = \x -> x
Prelude> :t f
f :: t -> t
Prelude> let g = \(x,y) -> x**y
Prelude> :t g
g :: Floating a => (a, a) -> a


ps.: Oh dear. "C does not have a built-in operator for exponentiation, because it is not a primitive operation for most CPUs. Thus, it's implemented as a library function." What have I gotten into.

edit: another test

[cling]$ #import <cmath>
[cling]$ long double a = 5
(long double) 5L

[cling]$ pow (a, 2)
(double) 25.0000

[cling]$ long double b = 2
(long double) 2L
[cling]$ pow(a, b)
(double) 25.0000

[cling]$ a*b
(long double) 10L


Ters

Quote from: prissi on November 25, 2016, 09:44:22 PM
And I wonder how one would test something that acts one an object as complex as the world_t in Simutrans.

You simply don't have so complex objects. On the other hand, I don't think world_t is that complex. The biggest problem is that almost everything else is now hard coupled to a single world_t. For testing, you'd want to pass the object being tested a mocked world_t that behaves just like it should for whatever scenario is being tested, and which possibly also tracks what interactions it receives, so that you can test that the object being tested calls to world_t in exactly the expected way. It is harder to mock if interactions between objects are not purely through interfaces, or at least virtual functions.

Quote from: sdog on November 25, 2016, 10:28:39 PM
Also with the complicated syntax, say anonymous functions in C++11, or things you are not actively used to?

Even when experimenting with the new C++ features, I wrote a lot more than just simple expressions to get the hang of them. And most of the troubles with them were getting them to compile. Once I got them to compile, they behaved like they should. The most troublesome errors C and C++ can give you during runtime, is when you've entered the mysterious realm of undefined behavior. These things have a nasty habit of working fine at first, only to start blowing up once in a blue moon some time later.

prissi

About c++ return types: Those can easily depend on calling types, if memory serves me right.

char * pow( char *, double ); and double pow( double, double ); both are legal, but pow ("", 0 ) would return a char * and pow (1,2) a double. Compiler warn at least when you do pow (int, int ) and pow( double, double )

sdog

Quote from: prissi on November 25, 2016, 11:19:28 PM
About c++ return types: Those can easily depend on calling types, if memory serves me right.

char * pow( char *, double ); and double pow( double, double ); both are legal, but pow ("", 0 ) would return a char * and pow (1,2) a double. Compiler warn at least when you do pow (int, int ) and pow( double, double )

I'm afraid I couldn't follow you with that.
Let: double d; int i; and char c;
The expression c * pow (c, d); does return a double [which is somehow strange (hackish?), this really ought failt to compile].

Why is it unusual that pow (d, d); is legal? I should expect that a^b is correct for all a, b in R.

Why compiler warnings for a^b for all a, b in N ? Aren't those exactly the correct uses? Perhaps only a^b for all a in R and for all b in N being more common?



[cling]$ double d; int i; char c
[cling]$ #include <cmath>

[cling]$ pow(c, d)
(double) 1.00000

[cling]$ pow(d, d)
(double) 1.00000

[cling]$ pow(i, i)
(double) 1.00000

[cling]$ pow("", 0)
input_line_15:2:2: error: no matching
      function for call to 'pow'


by the way, a funny one:

[cling]$ pow(0,-1)
(double) inf

Lovely: double precision infinity!


[cling]$ pow(0,0)
(double) 1.00000

Ouch.

Double ouch, this time for being thick, asking maths stackexchange:
http://math.stackexchange.com/questions/11150/zero-to-the-zero-power-is-00-1

jamespetts

A good way of testing small fragments of code, I find, is to use the visual debugger in Visual Studio, sometimes with special test variables the value of which can be observed at various times. The variables can be removed when the code has been tested.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

Ters

Quote from: sdog on November 25, 2016, 11:51:16 PM
I'm afraid I couldn't follow you with that.
Let: double d; int i; and char c;
The expression c * pow (c, d); does return a double [which is somehow strange (hackish?), this really ought failt to compile].

That makes perfect sense by the low level rules C follows. The only obfuscating part is the name of the data type char. It is not inherently a text data type, just another integer data type with a range of typically [-128, 127]. And integers can automatically be converted to floating point types. So your example makes just as much sense as 4 * pow(4, 3.1415);. In real life applications, it would perhaps be more usual that the exponent is the integer and the base a floating point number. And since exponents usually are smaller than 100, being able to store it in an 8-bit data type might have been important back when available memory was counted in kilobytes.

Quote from: sdog on November 25, 2016, 11:51:16 PM
Why is it unusual that pow (d, d); is legal?

Where did you get that idea from? There is nothing particular about that in C or in math in general. I can't think of any real life application for such a formula, but math doesn't seem care for such things.

Quote from: sdog on November 25, 2016, 11:51:16 PM
Lovely: double precision infinity!

Double precision positive infinity! One of those funny values that aren't equal to anything, even itself. At least it's comparable to numbers and the other infinity. NaN is neither. By not comparable, I mean that the comparison always yields false, not that you get an error. (C also has positive and negative zero due to the way it is implemented. All of this is according to IEEE 754, which is not unique to C. In fact, I don't know of any floating point implementation that doesn't, for the most part, follow that standard.)

prissi

About return types:

double pow( double a, double b )
{
  return exp( b + ln(a) );
}

char *pow( char *s, double b )
{
  static char ss[1024];
  sprintf( "%s^%f", s, b );
  return ss;
}

in which the pow(0,0) returns a double and pow( "s", 8 ) returns a string.

sdog

Thanks, you gave me good pointers!

It wasn't clear what you wrote before. Cheers for the clarification Prissi. Now it is only not clear to me what it means, that will take a bit until I've caught up.

Thininking about this. Do you (pl) think the approach to teach C++ while avoiding C is reasonable? The text book I chose (Koenig and Moo, Accelerated C++) made a point of not teaching C and not teaching pointers. Arguing that that would lead to bad practice in OO C++. But when standard lib definitions of functions cannot be understood by the student, that seems to be a somewhat dangerous approach.

Combuijs

Quote from: sdog on November 26, 2016, 10:36:40 PM

Thininking about this. Do you (pl) think the approach to teach C++ while avoiding C is reasonable? The text book I chose (Koenig and Moo, Accelerated C++) made a point of not teaching C and not teaching pointers. Arguing that that would lead to bad practice in OO C++. But when standard lib definitions of functions cannot be understood by the student, that seems to be a somewhat dangerous approach.

For writing new code or learning C++ that approach is not unreasonable, I feel, although you can't leave that much C away. For understanding existing code written by someone else, you should have knowledge about all kind of C-specifics and especially pointers as C and C++ are often mixed up. And while for example C# does not have pointers, it is still very useful to know how they behave. It makes it much easier for example to understand the difference between a shallow and a deep copy.
Bob Marley: No woman, no cry

Programmer: No user, no bugs



prissi

C++ is just like a preprocessor to C (indeed the very first C++ went that route) and internally makes heavy use of pointers. The class operation c->member (when c is a pointer to a class) is extremely common, and passing class variables as references is less common, and pointer are more (at least in simutrans). In principle, even C code could access C++ members (at least when on the same compiler and given similar structs and taking care of virtual function pointer in classes).

Moreover, even C++ new is a pointer operation. Given that the reason for C++ is usually speed (and second that only C libraries are available for a certain device/function), the misuse of pointers can strongly impair performance or make programs very unstable. If you want to avoid pointers, then do not use C++, use Java or some script languages.

Ters

What Java calls references is much more like what C++ calls pointers than what C++ calls references. Java references can be pointed elsewhere and they can be null, C++ references can not (unless you do evil things). The only difference is that Java references never point to uninitialized memory or otherwise led astray. You also neither can or need call delete on them to free memory, but that doesn't mean you don't have to thing about freeing memory, because it is still very much possible to leak memory.

And I'm not sure how you're supposed to do anything useful in C++ without using, or even knowing about, pointers is beyond me. Sure, you might use wrapper classes such as std::vector and the various smart pointers to such a degree that you might perhaps avoid using them directly, but good look debugging if you don't know how they work. And pointers are part of every API I have ever seen, so interacting with anything without using pointers will likely be difficult. I have seen a library which used references rather than pointers, but it treated the references like pointers anyway, making the entire thing just more confusing.

sdog

Understanding C is of course vastly more useful.

I've been very reluctant accepting the things Koenig and Moo found the most important. I wasn't so certain about it as I thought it were bias. Caused by my strong dislike (or lack of understanding) of OO. Perhaps its scope is as it is because the book predates the wide adoption of Java (2001). I think they try to get the students first to use and understand abstractions, as they fear they would just hack about in C otherwise. (That the first type the book introduces are strings speaks for much too, if i were to do string stuff today, why chose C++?)

I think now it is not a good idea for me to invest time into C now. Learning a few weeks these things will get me to a point where I could write nice 'hello worlds' or a sorting algorithm of limited use. Without learning C, which requires a very good understanding of microprocessors, I cannot claim knowing it. It seems much more sensible to focus on fruits that are easier to pick such as Cuda or MPI to open up posts that require HPC skills. (All the sweet Fortran jobs need that of course.)

Ters

Quote from: sdog on November 27, 2016, 12:40:41 AM
Caused by my strong dislike (or lack of understanding) of OO.

Object-oriented programming is so 1990s anyway. Trendy programmers do functional programming these days. However, now that functional programming is supported by C++ and Java, I guess it won't be long until functional programming is passe.

sdog

Quote from: Ters on November 27, 2016, 08:44:42 AM
Object-oriented programming is so 1990s anyway. Trendy programmers do functional programming these days. However, now that functional programming is supported by C++ and Java, I guess it won't be long until functional programming is passe.

I fear I'm not connected very much to the hip CS people. Functional is appealing to me as it is much easier to grasp from a conceptional point of view. Take this example of a power series for the exponential function: e^x = ∑_n x^n 1/n!

Prelude> (sum . take 18 . (\x ->  [ x^n / fact n | n <- [0..] ] )) 1
2.7182818284590455
Prelude> exp 1
2.718281828459045

That's Haskell (with a bit of syntactic sugar). The 'take 18' corresponds to an uper limit to the sum n=18 on the ∑, in the otherwise infinite sum. The function fact the fractional, defined elsewhere. Other from readability, the other advantage, when there is a strict type system, that one can ensure correct code (in tidbits). I don't even need to have a proofer. I made some mistakes with the type of n and did not produce valid code.

What I cannot accept in OO is that what my programme does depends on the internal state of objects, which is also entirely hidden from me. There might be methods to Give the complete nightmare side effects are, I just don't get the advantage of doing this, outside the fringe areas like i/O or GUi. Another problem i see with OO is that one has to test these objects then in all possible states for all extremes of acceptable input. That might easily lead a combinatoric explosion.

Not that this cannot be done in imperative programming (or functional) as well. I am so afraid of this exactly because I know 1000 line oblique Fortran subroutines that jugle data from global variable and may change stuff in any line.


However, now that functional programming is supported by C++ and Java, I guess it won't be long until functional programming is passe.
Well, OO was also a fad, and the hip crowd complained about imperative programming back then. And of course they weren't entirely wrong. It seems more to me that today there's much more of a realisation of where to use OO, where it is a hindrance. Likewise the strengths for imperative and functional approaches. Today most devs are multi language and multi paradigm. Those hordes of hardly trained VB and Java corporate code-slaves are apparently slowly disappearing (or are they simply outsourced to India?).

Ters

The reason for hiding the state in the objects is to avoid having lots of dependencies on the state scattered about the code. And the I don't see how you avoid multiple possible states by not using OO. The states are part of the real-life problem domain.

prissi

Again CUDA is a big subset of C with a few extensions, since it matches very well the low level code. If you want to write efficient code (and that is the point of CUDA, isn't it?) using a perl/python etc. wrapper will not give you peak performance. And there is also C++ for CUDA, so you can use complex number and exponent operator (I think those are even part of the latter implementation).

sdog

Ters,
I read my previous mail and have to apologise to you. I got carried away a bit with showing what I find so useful about functional, and the showing what my bias is against OOP. I did not mean to argue in favour of one and against the other. Yet reading my text again, it is very argumentative. This is out of place since I came to ask for advise.

If you enjoy a discussion on this topic however, I should love to reply to your last reply on state -- as a discussion in its own right, decoupled for my request for advise.

Ters

You clearly haven't seen many programming discussions. I wouldn't be surprised if it was programmers that invented the flame war.

I should perhaps have noted more clearly, although it is hinted at, that the "functional programming" that is hip now (at least in my field), is built on top of OO. It looks nothing like that Haskell example.

sdog

Thanks to all for the advise given in this thread so far!


@prissi: I meant use of CUDA in Fortran. But that was just a first thought, on a second thought, there's not much benefit from that. You are absolutely right in that regard.



I've decided now I'll go with "Koenig and Moo, Accelerated C++" focusing at the OO aspects of C++ and focus on the standard library, then going to simple C after that. Leaving understanding C to the future. One reason is simply to put something on my CV.

I've already had one negative reply on a job application. Without even an interview. Its a bit shameful to mention that publicly, in particular as none of my peers seemed to have failed that miserably. Apart from some mistakes in my cover letter and CV I think that lacking C++ knowledge is a reason. Notably, the same company posted a job three weeks later that would fit very much what I requested in my 'initiative application'.

Now that I fired off a real barrage of applications (four) and the job market is slow before the holidays I've got time for learning. My plan is to work for three weeks learning C++, one on C. Get it to a point where I can claim proficiency in a CV without lying, and try again with them. (They have by far the most intriguing projects, develop for linux, no Java(!), no .net(!!), they offer lunch, have showers, started as an university spin-off. On top of that they are in a city, in one of the German states with working school system, Nazis are rare, there's decent public and cycling infrastructure. Oh and did I mention, interesting projects.)

jamespetts

Very best wishes with your job hunt and your C++ learning. Have you started a Github account yet? Apparently, that's a good way of boosting a coding CV.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

sdog

For the sake of oversharing, I post a status update. Read on only if you don't mind a waste of time.

Not much progress so far. I've been fiddling for an umpteen amount of time to get a half-decent syntax completion engine for C running. Assuming it were better not to start by typing, and getting into good habits from the outset.

Comming from Fortran on one side, garbage collected, and immutable state languages on the other side, I've incredible trouble wrapping my head around memory management in C languages. It is effortless in the former and no concern of the programmer in the latter. The more I am afraid that if I am not diligent now I might cause at some point a memory leak or other unforgivable mistakes.

I thought as a starting point I better get to understand what the heap is, how it is structured etc. It occurred to me only after a while, that it might not have anything to do with the CS concept 'heap (data structure)' at all.

http://stackoverflow.com/questions/1699057/why-are-two-different-concepts-both-called-heap

Having only a passing acquaintance with graph theory and data structures, I had to catch up on that a bit first. *fact-to-desk* *grin*


Well compiling of some ncurses stuff is done, I may continue with fiddling on the syntax completer (YouCompleteMe) again. Since the checker is multi language, and the defaults in my configuration script are to have all turned on it is quite some work.
For example, there are stupendously many dependencies for C#. I can see now why there is so much fondness for the JRE.
(I also want to avoid having to install mono and JS stuff just to build it. Can't those people devloping stuff on Mac do anything without having half a dozen obscure frameworks that need to be installed and cleaned up for every bloody build job? Oh, it also requires for unknown reason the rather disrepute Boost library.)


ps.: I noticed that my posts here might appear aggressive, whiny, or complaining. That is a partial misconception, I hope. Firstly, I cannot deny that I am German, hence the complaining. For another, I am somewhat frustration tolerant (Fortran 77 wink, wink) and need to build up a bit of frustration to actually get somewhere. If there are no tough nuts to crack, it cannot keep my interest.

jamespetts

The interesting thing about learning C++ (and presumably also C) is that it teaches one rather more about how computers work than does learning a higher level language.

After learning the difference between the heap (i.e. system memory) and the stack (i.e. processor cache memory), the next step is learning about memory management of arrays (including when to use the delete[] command) and pointer arithmetic.
Download Simutrans-Extended.

Want to help with development? See here for things to do for coding, and here for information on how to make graphics/objects.

Follow Simutrans-Extended on Facebook.

HarrierST

Quote from: sdog on December 14, 2016, 09:49:28 PM
Notably, the same company posted a job three weeks later that would fit very much what I requested in my 'initiative application'.

Send in another application.The advert could be from a different part of the company, departments do not share info. Also human resources/personnel etc. what ever they are called now do not either.

HarrierST

Quote from: jamespetts on December 17, 2016, 11:26:26 PM
The interesting thing about learning C++ (and presumably also C) is that it teaches one rather more about how computers work than does learning a higher level language.

After learning the difference between the heap (i.e. system memory) and the stack (i.e. processor cache memory), the next step is learning about memory management of arrays (including when to use the delete[] command) and pointer arithmetic.

I never learned how to program C++ effectively, but did read up about it - by then I had moved out of programming.

But your last paragraph - brought back old memories. I started as a machine code programmer in 1970, moved on to Assembly then PL1/Cobol.  In those days to get the best performance you had to know how the machine and its language worked.

Because machines were less powerful, two programmers working on the same problem could get very different results.

Today - the attitude seems to be, as quick as possible to code, not as efficient as possible. Hey the speed of the computer will hide any poor coding.

DrSuperGood

Quote
After learning the difference between the heap (i.e. system memory) and the stack (i.e. processor cache memory),
Heap is an area of memory where dynamic memory allocation and other memory usage occurs. Stack is a memory structure that is unique to each processor thread.

Neither have anything to do with system memory or processor cache memory directly. Both can be backed by system memory and both can take advantage of processor cache memory for better memory I/O performance.

In some processor architectures the heap is physically separated from the read only processor code. In some processor architectures the stack is a unique register based data structure inside the processor (not memory backed).

Quote
Today - the attitude seems to be, as quick as possible to code, not as efficient as possible. Hey the speed of the computer will hide any poor coding.
Compilers optimize code a lot better now. It is far better to write something that is clean and easy to read than low level an efficient as chances are the compiler will take the resulting code there anyway by itself.

What is still important is to attack problems efficiently. Using a O(log2(n)) instead of O(n) makes a huge difference, more than all the micro optimizing one can ever do to a piece of code..

Ters

Quote from: jamespetts on December 17, 2016, 11:26:26 PM
The interesting thing about learning C++ (and presumably also C) is that it teaches one rather more about how computers work than does learning a higher level language.

I've read that one of the main principles in C is that it should be obvious to the programmer how the C code maps to machine instructions. So while other languages might be based more directly on mathematical concepts, C is simply an abstraction over machine code. C++ is another layer above that. The basic keywords, except exceptions, still map to assembly in the same predictable way they do in C. New features like member functions, inheritance and virtual functions also have a simple and relatively well known translation to machine code. Exceptions are more complex, with at least two rather different implementations. Templates and operator overloading can obfuscate the mapping between C++ code and machine code. Each successive C++ standard seems to increase the distance between C++ and the machine code, probably indicating that C++ developers focus more on higher level concepts than how their code maps to machine code (although it still gives them the ability to write low-level code by sticking to the "pure" C stuff). Optimizing compilers also obfuscate the mapping between C/C++ and machine code, but the developer must still ask for such optimizations as far as I know (IDEs might ask for you by default for non-debug builds, though).

Quote from: sdog on December 17, 2016, 10:57:32 PM
Comming from Fortran on one side, garbage collected, and immutable state languages on the other side, I've incredible trouble wrapping my head around memory management in C languages. It is effortless in the former and no concern of the programmer in the latter. The more I am afraid that if I am not diligent now I might cause at some point a memory leak or other unforgivable mistakes.

To think that memory management is not a concern in garbage collected languages is a big mistake. Just because there is no keyword or function to free memory, doesn't mean that you don't have to tell the runtime that you are done with a piece of memory. While doing so will often be so trivial that you don't think about it, there are as far as I know cases in all GC-ed languages where you have to be more explicit. Immutable state languages might have less of this than mutable state languages, though. I only have experience with the latter, but I can't imagine the other being completely fool-proof.

sdog

@Ters
Quote
I've read that one of the main principles in C is that it should be obvious to the programmer how the C code maps to machine instructions. So while other languages might be based more directly on mathematical concepts, C is simply an abstraction over machine code. C++ is another layer above that. ...

That's quite interesting.

I've also unearthed the old Dijkstra quote, I mentioned before, it goes into the same direction. It says something along the lines: To use a level n of abstraction one has to understand the level n-1 and have more than a vague idea of n-2. He also said that learning C is not about learning programming but learning how machines work in detail.

[He also said something along the lines that Fortran ought to be forbidden, as it spoils young minds :-( (but that was in the days were people didn't mind using conditional, numbered goto statements.)]


QuoteTo think that memory management is not a concern in garbage collected languages is a big mistake. Just because there is no keyword or function to free memory, doesn't mean that you don't have to tell the runtime that you are done with a piece of memory. ...
Memory management is much more difficult for functional languages as since state isn't changed memory is assigned, for example, at every recursion step. However, it is also much easier for the runtime/compiler to determine when it can be freed.

In most GC languages it was relatively effortless, and most of the work seems to be done by following good coding practices. However, I've not used them in so much depth that I had to bother very much.


Lastly Fortran, memory management is rather easy and straight forward there. Firstly, in most cases one statically assigns fixed size arrays. It is more important that the memory footprint is predictable and fits with the ammount one has available for each instance of the programme or each thread. Typically memory as a much less scarce resource than CPU time. In new Fortran one can dynamically assign arrays. And also deallocate them directly. That can be very useful in the most memory intense problems. It is also much easier than in C since one does it also much more abstract. One simply allocates an array of required size, the compiler takes care of getting, preferably continuous, range of addresses from free memory, to use it in allocated memory. So no questions of stack or the ominous heap.

No pointers either. If one knows a few rules one can simply read and write multi dimensional arrays sequentially by iterating through them. Subroutine calls provide transfer not a memory address but the value of the data. In old Fortran one would initialise an array of fitting, fixed, size. In modern Fortran the compiler can infer that itself for static size arrays. (Never checked for dynamic sized arrays as they are a rare fringe case.)  At close of the subroutine the memory is freed automatically again. In consequence, and much unlike C/C++, if one is not careless, memory management happens without having to think about it. Typically a mistake that would cause it to fail, would have more catastrophic causes before.

Since there are no global variables, if one wants to write to a data structure directly, it has to be included in a common block. These blocks have to be consistent in every subroutine. Thus they are dreaded, target of some hackish solutions from the olden days, a source of frequent obscure errors, and a great hindrance to modular code. I dislike them with a passion. There are two consequences. In order to reduce memory footprint there is a lot of 'impure' meddling with stuff in common blocks happening in subroutines. Such that often a routine gets only some unimportant stuff passed as arguments, while all relevant input comes and output goes by directly writing to common block variables.

In new Fortran one may pass a reference to a data structures. There are also user defined, more complicated, data structures, somewhat similar to structs.

@DrSuperGood, HarrierST
QuoteWhat is still important is to attack problems efficiently. Using a O(log2(n)) instead of O(n) makes a huge difference, more than all the micro optimizing one can ever do to a piece of code..
Well, thats the much easier part, and the stuff people, at least proper CS students, love to learn. After all, your in a branch of applied Mathematics. I remember looking at the stuff my wife learned for tests, theoretical informatics, graph theory, number theory. That was a real joy. The other stuff was so dreary,UML, data warehousing, etc.

I might just parrot others: to me it seems micro optimisations tend to lead to not very future safe code, are error prone, and typically unreadable. Computing time is in most applications much cheaper than developer time. Good code is read much more often than written. There are good reasons for high levels of abstraction.

It is arguably also easier to find staff. Anyone with a bachelor in something only remotely technical understands the mathematical formalism. Few people understand how a computational machines work (me included). It is a difficult and specialised topic. (I have to admit I've recently applied for a job as FPGA developer, I need a better narrative here, in case I get a chance for an interview...)

QuoteSend in another application.The advert could be from a different part of the company, departments do not share info. Also human resources/personnel etc. what ever they are called now do not either.
Thanks a lot for the advise. I shall do it. Unfortunately the position was posted by the same HR employee (the company only has three) and is for the same dev team. Nonetheless, I'm going to ask them again. Firstly, it is difficult for them to get people, they tend to have those adds open for months. Secondly, with some work on C, and a better CV my chances might be better. And thirdly, even for a small chance it's worth it. No other job offer made me as eager as them. Not even a technical job at a heavy Ion accelerator (they are applying the type of collisions I've researched).


@James, Harrier, Ters, DSG
Thanks a lot for the feedback. I have many new questions in my mind, caused from this discussion. It was quite fruitful.

Ters

Um, Fortran is always/was originally pass by reference, not pass by value. That is the way one could/can redefine 0 to be 1 (same for any other pair of values).

sdog

Quote from: Ters on December 18, 2016, 06:20:49 PM
Um, Fortran is always/was originally pass by reference, not pass by value. That is the way one could/can redefine 0 to be 1 (same for any other pair of values).

Fortran 95, possibly earlier Fortran, does indeed allow passing by value. I think one can find that a lot in C wrappers around Fortran functions, eg, in linear algebra libraries like BLAS.

The trouble with (old) Fortran when passing by reference is that one needs to replicate the data structure. For instance, if you pass an NxM matrix the routine needs a MxN matrix explicitly declared (dummy variable). In new Fortran there are also allocatable dummy arrays that infer size from the passed variable.

In other words, in Fortran it is abstracted from a Variable in the routine call and corresponding dummy variables in the subroutine itself. It is left to the compiler how to associate both. That usually means that the dummy variable is associated with the same memory address. But, as mentioned before, that can go wrong. Or, can be (ab)used, for example often variables of multi dimensional matrices in routine calls may be used as one dimensional arrays in dummy variables. As one assumes a contiguous range in some abstract memory (somehow mapped to physical memory) and knows the way arrays are constructed this is possible. The only thing we can take for granted is that the dummy variable is associated to the beginning of memory in which the data is provided. Depending of the 'intent' (in/out/inout) there is a mapping from the dummy variable to the variable.

Combuijs

Quote[He also said something along the lines that Fortran ought to be forbidden, as it spoils young minds :-( (but that was in the days were people didn't mind using conditional, numbered goto statements.)]

You refer to the famous "go to statement considered harmful" article, I suppose?
Bob Marley: No woman, no cry

Programmer: No user, no bugs



sdog

Quote from: Combuijs on December 18, 2016, 07:39:44 PM
You refer to the famous "go to statement considered harmful" article, I suppose?
Probably. I think I briefly read it a few years back.

In F77 computed GO TO statements were deprecated,
these had the form GO TO (L_1 .. L_M) N
for a jump to label L_N for a 1<=N<=M.

We had a library routine in our code that used that used these and GOTO as its only control structures. That thing was super dense, one or two hundred lines of code for a stiff ordinary differential equation solver (ODE). Certainly not spaghetti code, but state of the art correct, reliable, and efficient. Written a year or two before I was born. Didn't matter that more than 30 years passed, I still felt effin spoilt by it. The best comes last, it was still good enough to compete with new ODE solvers, some written around 2000 or so.

DrSuperGood

The problem with goto statements in general is that they are as good as redundant. In a language like C or C++ there is as good as nothing a while/do while/for/if/switch with break cannot do that goto can. A lot of the code optimizations involving goto are automatically applied by modern compilers.

Quote
The best comes last, it was still good enough to compete with new ODE solvers, some written around 2000 or so.
Pre 2000 pretty much the only thing that was improving with computers was speed and quantities. More memory, more storage, more clocks per second etc. As such old algorithms developed a decade or so earlier still scaled well as they could take advantage of all the mores available. Post 2000 things have got a lot more tricky as more is no longer really an option as processor clock speed growth has near flat lined while memory and storage have reached such volumes they are no longer much of a concern. To get around this post 2000 hardware has introduced a lot of complexities such as multiple cores, new instructions to compute many values at once, more registers and in some cases even access to dedicated hardware like standardised GPUs to outsource certain types of massively parallel calculations.

If the code was very high level then some of the features, such as new instructions, might be used automatically. However if not then it will not be able to take advantage of such available features and so perform similar to how it did around 2005 odd. The API used might not be suited to take full advantage of modern hardware, limiting the maximum obtainable performance. An old API might take in only 1 value, process it linearly and produce a single output where as a more modern API might support multiple value inputs, process them in parallel and produce multiple outputs. Both API and implementation are almost certainly a lot more complex than something from 1990 however its performance might end up being orders of magnitude faster today.

However you probably already knew that. I am just putting this here in case anyone wondered why such things might no longer be good enough to compete in this day and age.

Ters

Quote from: DrSuperGood on December 18, 2016, 09:27:08 PM
The problem with goto statements in general is that they are as good as redundant. In a language like C or C++ there is as good as nothing a while/do while/for/if/switch with break cannot do that goto can. A lot of the code optimizations involving goto are automatically applied by modern compilers.

I'm not sure that's the "popular" problem with goto statements. There are certainly those who would argue that break statements are at least almost as evil (except probably in switch blocks, where maybe not using them is seen as evil), and that the same is true for multiple return statements in a function.

sdog

QuoteIf the code was very high level then some of the features, such as new instructions, might be used automatically. ...

With scientific Fortran code that is very much a question of the compiler. Ie, such things tend to run quite a bit faster when compiled with the Intel fortran compiler, stipulating Intel hardware. 2000ish is already something pretty recent. Speed never bothered me much in my stuff so i kept to my local cluster (only gcc) and not on remote machines with intel compilers. A friend tested it, compiled with intel they were about 5 or 10 percent faster. For test problems he could get larger differences. In contrast, a smart choice of model parameters made several magnitudes difference in runtime. (In one case, reducing the size of the vector space, I went from about 1 Gs to half a mega second, calculated based on scaling time for one operation with the increased complexity).

Solving ODE are old problems, when hardware limitations didn't prevent the best algorithms to be used, it seems there wasn't much to develop with regards to performance and benign problems. The advance came with treating stiff problems and tackling more difficult problems.

These algorithms look deceptively much like a simple Adams-Bashforth, predcitor-corrector and such things from a basic course in numerical methods, with the addition of 8 pages of equations for some weird cases, referencing a dozen other articles dealing with other special cases.

The thing is, one might just re-write that old code in modern Fortran. Or perhaps even C. But there's nothing to be gained. On the flipside, one loses reliability, that stuff was scrutinised by perhaps 8 generations of grad students and post docs. It is archaic and awful to read (today), but as long as it compiles with state of the art compilers, its as fast as it can be.

@Ters
Quote
I'm not sure that's the "popular" problem with goto statements. There are certainly those who would argue that break statements are at least almost as evil (except probably in switch blocks, where maybe not using them is seen as evil), and that the same is true for multiple return statements in a function.
I understood that the biggest problem is the tendency to spaghetti code. I mean, that anti goto article was from the early 70s? The debate if structured programming with semantic control structures like loops and conditionals is fought.

Are multiple return statements really that bad?

This (py) code:

def heavyside(a):
   if a > 0:
     return 1
   else:
     return 0

looks clearer to me than that one:

def heavyside'(a):
   if a > 0:
      b = 1
   else:
      b = 0
   return b


Perhaps I look at it a bit biased since In languages with pattern matching that is actually the standard. Different example

g :: (Int b) => b -> b -> b
g a b
g 0 1 = 1
g 1 0 = 2
g 1 1 = 3
g _ _ = 0


above heaviside function:

f :: (Ord a, Num a, Int b) => a -> b
f a
   | a> 0 = 1
   | otherwise 0


Since this is a C/C++ thread, lets try it in it. I suppose the following is bad form then:

int heavyside (double a)
{
    if ( a > 0 )
    {
        return 1;
    }
    return 0;
}



int heavyside_prime (double a)
{
    int b; // init var without asg. value, bad?
    if ( a > 0 )
    {
        b = 1;
    }
    else
    {
        b = 0;
    }
    return b;
}


The break statement is nothing but a goto anyway. Regard this F77 loop over a matrix, doing something pointless:

        b=0
        DO 10 j=1, n
        DO 10 i=1, n
          b = b + A(i,j)
          IF b.LT.0 GO TO 10
10    CONTINUE




Follow up on my C++ example. It is indeed bad what I did. Confer:
http://stackoverflow.com/questions/1597405/what-happens-to-a-declared-uninitialized-variable-in-c-does-it-have-a-value


int heavyside_twoprime (double a)
{
    int b = 0; // now with determined value!
    if ( a > 0 )
    {
        b = 1;
    } // but not as clear since there's no more else.
    return b;
}


edit: added missing `Ord a` to type declaration in a Haskell example

DrSuperGood

Quote from: sdog on December 18, 2016, 10:58:39 PM
Since this is a C/C++ thread, lets try it in it. I suppose the following is bad form then:

int heavyside (double a)
{
    if ( a > 0 )
    {
        return 1;
    }
    return 0;
}



int heavyside_prime (double a)
{
    int b; // init var without asg. value, bad?
    if ( a > 0 )
    {
        b = 1;
    }
    else
    {
        b = 0;
    }
    return b;
}


Except one can write it like...

int heavyside (double a)
{
    return a > 0 ? 1 : 0;
}

Which is 1 return, 1 line, functionally the same (as far as I am aware) and still very readable.

Quote
Follow up on my C++ example. It is indeed bad what I did. Confer:
You initialized the variable before use which is completely fine to do, even if it was not initialized at declaration. The compiler will automatically produce an initial value setup or branch code in such a way that the variable always has the value desired.

The only thing one has to be careful with is that a value is assigned before the variable is used. Simutrans suffered from such a problem a while ago resulting in MSVC and GCC builds being incompatible due to different or no assignment with an uninitialized field used to calculate pakset hashes.

In Java one can even initialize final (Java's equivalent to const) variables with such conditional statements. As long as the variable is assigned a value in all cases it is fine. This does not apply to C++ though where a const can only be assigned with a declaration.

I personally have no problems with multiple returns. My general aim when programming is toward mostly flat, linear code where any given line is within as few flow control statements as possible and the function executes downwards broken apart into separate related steps. Often one can be tempted to place more and more code inside conditional statements. Doing so results in code that I personally find hard to understand what is happening and is difficult to read due to the excessively deep tabbing required. As such if I have an early exit clause (eg nothing to do detected at start) then I test for that and return, thus keeping the rest of the function code flat and linear instead of having to place it in an almost function long else block which itself might branch further into other blocks. It just makes so much sense as one can read from start of function and see immediately "the function exits due to nothing to do" instead of having to scroll all the way down to a return statement and try to see where that value came from.

Ters

Quote from: sdog on December 18, 2016, 10:58:39 PM
2000ish is already something pretty recent.

That is not clear, because quite radical changes took place between 2000 and 2010. At the beginning of the decade, most people only had computers with a single processor with a single core. Vector instructions were however probably already common. At the end, multi-core CPUs were common, plus one, or even two, GPUs. The idea of using GPUs for thing other than graphics had also taken hold. There were also the mixed-paradigm Cell processors, containing a traditional core plus several "GPU-style" cores. But I guess the people using Fortran probably already had workstations with multiple CPUs and/or clusters.

Quote from: sdog on December 18, 2016, 10:58:39 PM
Are multiple return statements really that bad?

This (py) code:

def heavyside(a):
   if a > 0:
     return 1
   else:
     return 0

looks clearer to me than that one:

def heavyside'(a):
   if a > 0:
      b = 1
   else:
      b = 0
   return b


I've been asking myself the same question. The code is question is however more complicated than this, with multiple nested loops and/or if-blocks, which might be the actual problem.

Vladki

Quote from: DrSuperGood on December 19, 2016, 01:11:53 AM
Except one can write it like...

int heavyside (double a)
{
    return a > 0 ? 1 : 0;
}

Which is 1 return, 1 line, functionally the same (as far as I am aware) and still very readable.

And you can simpify it further to:

int heavyside (double a)
{
    return (a > 0);
}


DrSuperGood

Quote
Vector instructions were however probably already common.
They were very basic compared with all the vector instructions released post 2000. Modern vector instruction extensions released within the last year or so are capable of manipulating 16 different 32 bit numbers (eg floats) at the same time in a 512 bit long register. They contain multiple such 512 bit registers as well. Of course at the end of the day the bottleneck becomes how many separate functional units exist to do the required calculations (a vector need not be implemented in a way that all calculations are run in parallel at the same time, it might be limited by the number of free appropriate functional units) however I am pretty sure modern processors have a lot more of those as well.

sdog

Quote from: Vladki on December 19, 2016, 11:18:19 AM
And you can simpify it further to:

int heavyside (double a)
{
    return (a > 0);
}



Ouch! That hurts! It seems old C didn't have boolean types and abused integers for that. Thanks Vladki, I expect I would not have learned that for a while had you not mentioned it. Here's some experimenting with it:

[cling]$ (0<1)
(bool) true

[cling]$ 5 - (0<1)
(int) 4

[cling]$ 5 - (0>1)
(int) 5


Quote from: DrSuperGood
Except one can write it like...

int heavyside (double a)
{
    return a > 0 ? 1 : 0;
}

Which is 1 return, 1 line, functionally the same (as far as I am aware) and still very readable.
I never managed to remember that syntax, used it often in gnuplot. It looks almost perlish, is it discouraged to use it today?

I see the trouble with using such a minimal example is that it is so easy to find much simpler solutions. I ought have stayed with a proper definition of heavyside, and make it real :-)

By the way, to define this Heavyside to work independent of the type of input, as long as it is numeric and comparable, I would have to overload it? Speaking of typing, I botched the type declaration in the Heaviside Haskell example: the input must be of a type that can be ordered (corrected now).

@Ters

I meant using return it in the ways DrSuperGood mentioned, for example, to check for trivial cases at the outset, and return right away. Rather than having nested ifs. But also for simple things returns seem to be good practice is some languages. A function that is built similar to the pattern matched examples, ie a number of checks for conditions and then returning right away is typically considered bad form in C/C++?

Quote
That is not clear, because quite radical changes took place between 2000 and 2010. At the beginning of the decade, most people only had computers with a single processor with a single core. Vector instructions were however probably already common. At the end, multi-core CPUs were common, plus one, or even two, GPUs.
Well, CUDA is really quite different and new. I suppose that might have changed things for some problems. It takes quite different algorithms. From the type of problems I know, there's not much gained from palatalisation of solving DEs, usually one has a large number of parallel threads each solving an ODE. That can be achieved for example by subsituting or modelling multi-variable DE as coupled single variable ODEs. When this is done, there's not much difference to 2000, if you run your stuff on 128 CPUs with one thread and core each or on 16 octo-core cpus doesn't change your approach to parallelisation very much. True, communication between cores is faster, but its still massively punished.

You may also consider that an algorithm that ran on 1975 hardware might be not that unsuitable to adopt to run on the very limited resources of a single CUDA thread. If the problem is embarrassingly parallel It might be tempting to make these solvers even simpler, drop all support for stiff problems, and just give up and pass it on to a proper CPU, with more sophisticated algorithms. However, that is just a wild ad-hoc speculation.

Quote
Vector instructions were however probably already common.

Yes, they were a very great incentive to use Intel Fortran compiler over gcc gfortran. I cannot recall if SSE was available in gfortran when I started in 2005. There was on way to get ifort anyway back then. Anyhow, that's job of the compiler, it wouldn't change the way the code was written in any way.

There is one more aspect to it, one can write vector and matrix operations per element or use built in functions per whole matrix in new fortran standards. New fortran standards were not supported by gfortran for quite a while. The compiler was also not smart enough to identify something as a simple vector operation that could benefit from vector instructions.

Ters

Quote from: sdog on December 19, 2016, 05:14:47 PM
Ouch! That hurts! It seems old C didn't have boolean types and abused integers for that.

It didn't abuse integers for that. They were integers already when C came around. CPUs in general have integer and floating point registers, and maybe also special address registers. Some had BCD-registers in the old days, and now we have vector registers. But I have never heard of boolean registers, beyond status registers, which aren't available in the same way. Not that I'm familiar with a great deal of CPU architectures, and only x86 is fresh in my mind.

And booleans are still just integers in the newest C standard from what I can gather. The "new" native bool type is just another integer type just like char, short, int and long.

Vladki

AFAIK most languages will happily convert boolean to int as false=0 and true=1 and vice versa. Only exception I know is bash (and other unix shells), where true=0 ;) 

Also that example function heavyside(), would make more sense returning bool instead of int.

DrSuperGood

The bool type is there to make clear that the API will output either true (not 0) or false (0). The problem with int as a logic type is that it implies that any numeric value in range of the type could be output and might need to be dealt with (not clear that it is for logical use).

For example one could change the previous code to the following in which case the output clearly is not a boolean.

int heavyside (double a)
{
    return a > 0 ? 1 : -1;
}

As such I would disagree with the following solution...

int heavyside (double a)
{
    return (a > 0);
}

Unless the function declaration was changed to make it clear that the output is logic and not a number.

bool heavyside (double a)
{
    return a > 0;
}

Although C/C++ do probably strictly define the values of logic operation results now, in theory the logical true produced need not be defined as 1, it could be -1 or any non-zero value. This is because logic instructions generally assume that 0 is false and anything not 0 is true and as such any non 0 value could be used to represent true.

Simutrans abuses this a lot to test for NULL pointers. Instead of checking pointer != NULL it just tests for pointer since NULL is defined as 0 in the case of Simutrans so anything not NULL is logical true. Personally I am against this since a pointer is not a logical value, despite being able to be used as one.

Ters

Quote from: DrSuperGood on December 19, 2016, 08:03:29 PM
Simutrans abuses this a lot to test for NULL pointers. Instead of checking pointer != NULL it just tests for pointer since NULL is defined as 0 in the case of Simutrans so anything not NULL is logical true. Personally I am against this since a pointer is not a logical value, despite being able to be used as one.

Preferring to write as little code as possible, but with meaningful names, the ability to test the validity of a reference by just writing the name of the reference is something I actually miss in Java. (A non-null pointer in C/C++ isn't necessarily valid, but that is a different problem.) Then again, NULL references has been declared (one of) the biggest mistake in software engineering, by the very man who invented them. I don't think I have ever tested that an actual number is 0 by casting it to a boolean, just pointers. And perhaps some C++ classes that have a bool conversion operator for whatever reason.

sdog

It pays not to use an interactive shell only when learning a language:


Here's a trivial example that fails horribly when compiled without specifiying any flags to g++.

#include <iostream>

int main()
{
    uint a;
    a = 1;
    while (int i=0; i<=2; i++)
    {
        a = a-i;
        std::cout << a << std::endl;
    }

    return 0;
}

// vim: et ts=4 sw=4 sts=4


Now, unsuprisingly that oughtn't work, and it does not:

➜ make       
g++     test.cc   -o test
➜ ./test
1
0
4294967294

I've to find the correct compiler flags (trap) to throw an exception when this happens. That ought to apply equally to integer overruns, or are these automatically NaN, like with floating types such as double?

That is the most severe case of while having with C++ something that looks rather high level, it is deep down still C. As such more concerned with machine comprehension than human comprehension. (For fun I've tried to google "mathematical definition of C data types" but didn't find anything. It would be quite interesting to see how uint is defined, since it is very clearly not ℕ0). As a learner I keep from this: C++ really can be a minefield, and great care must be taken to avoid unvorgiveable mistakes.

There is one more aspect, it's the first time in a long while that I use a for loop rather than a while loop. I've found the for rather superfluous, however, in C/C++ this is different. Apparently the while only accepts either a declaration or a condition in its argument. While the for allows to make loops with an variable that is to be iterated.

Quote
Quote
Simutrans abuses this a lot to test for NULL pointers. Instead of checking pointer != NULL it just tests for pointer since NULL is defined as 0 in the case of Simutrans so anything not NULL is logical true. Personally I am against this since a pointer is not a logical value, despite being able to be used as one.

Preferring to write as little code as possible, but with meaningful names, the ability to test the validity of a reference by just writing the name of the reference is something I actually miss in Java.

That seems quite hackish. Could this have been avoided by some syntactic sugar, like, overloading the 'if' builtin function to test existence?

@Ters
Quote
It didn't abuse integers for that. They were integers already when C came around. CPUs in general have integer and floating point registers, and maybe also special address registers.
Yet we rely on higher abstraction than just manipulating registers. In essence all data types in any language may be reduced like that. Since integer registers are different in different machine architectures, isn't int in itself already as much an abstraction as boolean?

In one case we don't want type mismatches, and we are quick at criticising obsolete coding practices that employed those on purpose (eg in all caps languages like BASIC or FORTRAN). On the other hand a type mismatch from so extremely different types like boolean and integer may be acceptable when convenient? Perhaps that's just my inexperience, but that seems somewhat hackish to me.


ps.: I thought it would be a spiffy idea to write a simple parser that reads scope from syntactic whitespace indents and inserts `{...}` and `;` pairs as needed. Then pipe the source first to the parser and then to the compiler. Voila, C to be nearly as readable and writable as python.

prissi

C happily allows integer under and overflows and will not throw an exception when bits are lost. You may find flags that warn you against it, though.

C was never made for such, and testing for overlow is very time consuming. Maybe you can find some of that in floating points who go to +- inf resp. to zero. Bit throwing bits way but some arithmetic is extremely common when driving hardware and thus one of the areas where C(++) is used most (at least in the beginning).

DrSuperGood

Quote
Now, unsuprisingly that oughtn't work, and it does not:
What part of it does not work? It looks fine to me at a glance.

1 - 0 = 1
1 - 1 = 0
0 - 2 = 4,294,967,294

My manual trace gets the same results as was output....

Quote
I've to find the correct compiler flags (trap) to throw an exception when this happens. That ought to apply equally to integer overruns, or are these automatically NaN, like with floating types such as double?
The entire way singed types work is by under/overflow. In fact unsigned arithmetic and signed arithmetic are the same... This is due to two's compliment. Multiplication and comparisons on the other hand are a completely different story.

Quote
There is one more aspect, it's the first time in a long while that I use a for loop rather than a while loop. I've found the for rather superfluous, however, in C/C++ this is different. Apparently the while only accepts either a declaration or a condition in its argument. While the for allows to make loops with an variable that is to be iterated.
For loop is nothing more than a neater way of showing a while loop.


// for loop
for (int i = 0 ; i < x ; i+= 1) {
    something(i);
}

// while loop
int i = 0;
while (i < x) {
    something(i);
    i+= 1;
}

Both are functionally the same, just one does all the syntax in a single line. I recommend using a for loop when iterating something sequentially. I recommend using a while loop when testing for something that is not sequential. Exception is Java where for loops have a special hard coded mechanic for types that extend Iterable, but the syntax of the for loop is then different and often such loops are not that useful due to a lack of counting (only have access to an element, not what number the element may be).

QuoteThat seems quite hackish. Could this have been avoided by some syntactic sugar, like, overloading the 'if' builtin function to test existence?
As far as I am aware "if" is not a function but rather a programming primitive. As such I do not see how it could be overloaded, but I do admit I am still novice to C++.

Sure testing the pointer is shortest code wise, however a pointer is not really a boolean logical value but rather a memory address. It firstly couples to NULL being address 0 (which is mostly standard now, but not necessarily the case). Secondly it is using the fact that logical 0 is 0 while logical 1 is not 0 which is not really pointer logic, as in theory a pointer to 0 could exist and could be meaningful (I do not know of any real examples where it is). NULL is a special pointer value specifically designed to represent an invalid pointer, hence comparing for it does make sense.

Quote
Yet we rely on higher abstraction than just manipulating registers. In essence all data types in any language may be reduced like that. Since integer registers are different in different machine architectures, isn't int in itself already as much an abstraction as boolean?
Registers have little to do with it. Registers are more a part of what resources you have available or used to compute with. Memory is slow compared with registers which is why modern instruction set extensions keep adding them. In fact the newest ones are a massive 512 bit long and can compute 16 different 32 bit values (floats or ints) in a single instruction call.

Quote
In one case we don't want type mismatches, and we are quick at criticising obsolete coding practices that employed those on purpose (eg in all caps languages like BASIC or FORTRAN). On the other hand a type mismatch from so extremely different types like boolean and integer may be acceptable when convenient? Perhaps that's just my inexperience, but that seems somewhat hackish to me.
It is hackish and violates type safety. Hence why Java only allows bool values in its tests. C and C++ are still low level enough the get away with the legacy assembly level behaviour of logical 0 being 0 and logical 1 being non 0. Almost every instruction set I know follows that low level behaviour, however it is not type safe as a pointer is not a logical value although its meaning and test might work in such a situation.

Joke is compilers output the same code. Test if a pointer is null and you will get the same code out as simply testing the pointer as if it is a logical value.

Quote
C happily allows integer under and overflows and will not throw an exception when bits are lost. You may find flags that warn you against it, though.
Using such flags in C/C++ directly seems to be a minor problem and a huge source of errors. I wonder why they did not add primitive language features to do this...

sdog

It would have been a smart idea then not to permit subtraction for uint.
Oh, nevermind, there appears not to be an error message when adding int and uint. (I thought C were strongly typed!) It seems to implicitly cast int on uint(!) edit On some machines it only needs to be ignorant of what type of integer it is, since with setting the sign bit interpretation 0 might denote the presence of a negative value for a given digit.


[cling]$ const int a = -10
(const int) -10
[cling]$ const uint b = 1
(const unsigned int) 1
[cling]$ a + b
(unsigned int) 4294967287


I think I could draw two consequences: (a) avoid uint by all means or (b) check before or after each arithmetic operation if negative values are present or might occur.

(b) would be feasible, if only subtraction were dangerous.
(a) seems to be problematic as well, as uint are apparently frequently used. There might be a reason. Even if not, it might still be found in code. (there might not be much use for uint, outside of peripherial cases like streaming IO, with 64 bit registers the gain of 1 bit is marginal.)

Quote
C was never made for such, and testing for overlow is very time consuming.
Trapping 'NaN' with gcc in Fortran was already noticeably slower. I see the difficulty here.


Cheers for this discussion, while it is more than scary, it is good to know about such trapdoors.



DrSuperGood:
Quote
As far as I am aware "if" is not a function but rather a programming primitive. As such I do not see how it could be overloaded, but I do admit I am still novice to C++.
I've heard primitive, and thought it only an idiosyncratic expression for built in function. That might be quite interesting to learn what primitive actually means in C.


Quote
Joke is compilers output the same code. Test if a pointer is null and you will get the same code out as simply testing the pointer as if it is a logical value.
That's no joke at all. Being one layer of abstraction closer to the machine means one has to realise the lacking abstraction oneself. This is more tedious and dangerous, hence the popularity of high level languages. In my novice opinion that something does 'work' the same way doesn't free one from doing a proper abstraction, ie abstracting to something that can be expressed in mathematical formalism and following good practices such as type safety.

Ters

Integer wrap-around is such a useful thing that even languages that otherwise add lots of other expensive checks (like array bounds checking), do not add checks for integer overflow. In fact, I'm not sure I have ever actually used a programming language that does, but then there are quite a lot of languages I have never explored this aspect of. Division by zero will however crash, since such a check is typically built into the hardware. It is at least on x86, but x86 is supposedly one of the weirder popular architectures. I don't think you will get any C++ exception for this in C++, because C++ exception are a different concept from hardware exceptions (Windows might have some way of treating them similarly, though).

Floating point will however give you overflow and underflow errors, because wrap-around there is not useful for anything, and floating point is more complex anyway (which is why Simutrans doesn't use it). I don't think you can actually reach floating point infinity through addition. To do that, you need to divide by zero. This means that while integers invariable fail for division-by-zero and let overflow pass, for floating point it is the other way around, except that you can configure whether division-by-zero should fail (again, x86, or rather x87).

uint is strictly speaking not a more dangerous data type than any other. Since what you get internally when subtracting 1 from 0 is exactly the same, you will run into problems either way, maybe even exactly the same problem depending on what you do next (such as using the value as an array index). You should check the inputs before doing the operation. How long before depends on how predictable things are and how paranoid you can afford to be.

DrSuperGood

Signed and unsigned addition and subtraction are the same. The same instructions are used and the same result is produced. This is because of two's compliment mechanics.

Eg 0 - 1 requires underflow to work.

// unsigned example.
uint8_t zero = 0; // 0x00
uint8_t one = 1; // 0x01
uint8_t out = zero - one; // 0xFF = 255
// underflow occurred!

// signed example.
int8_t zero = 0; // 0x00
int8_t one = 1; // 0x01
int8_t out = zero - one; // 0xFF = -1
//underflow occurred! Or did it? If using a signed instruction this might not want to be reported as underflow

Now division and multiplication are an entirely other story. How they are implemented depends on the platform. Some platforms offer different instructions for signed and unsigned. Other platforms, usually micro processors of sorts, require one to emulate a signed multiplication using some tests and unsigned multiply.

Most architectures do set flags when overflow or underflow occur. These flags can be tested immediately afterwards and appropriate corrective or informative actions can be taken. Problem is that testing such flags is probably more expensive than performing the operation which produces them, as such by default you do not want to test them. For signed addition and subtraction they are pretty meaningless as overflow and underflow occur all the time however I do think some special signed instructions might exist which do set them meaningfully (take into account the magnitude of the numbers).

Languages like C/C++ could offer some form of overflow and underflow detection. A set of special addition, subtraction, multiplication, division, etc functions would be needed which meaningfully test the overflow/underflow bits after performing the operation, and run specified code when such condition is detected.

Floating points are handled slightly differently. They have an interrupt vector to catch certain bad condition and take corrective action or adjust results, similar to how divide has for division by 0. Many of the features of floating point units are abstracted away from some languages such as Java.

Ters

Quote from: DrSuperGood on December 22, 2016, 05:00:34 PM
Most architectures do set flags when overflow or underflow occur. These flags can be tested immediately afterwards and appropriate corrective or informative actions can be taken. Problem is that testing such flags is probably more expensive than performing the operation which produces them, as such by default you do not want to test them. For signed addition and subtraction they are pretty meaningless as overflow and underflow occur all the time however I do think some special signed instructions might exist which do set them meaningfully (take into account the magnitude of the numbers).

x86 has two overflow flags (the term underflow is not used for integer operations), one of which indicates signed wrap-around (for signed char, between -128 and 127, either way) and one which indicates unsigned wrap-around (for unsigned char, between 0 and 255, also either way). The comparison operators (such as > and <) are implemented as a subtraction operation (which discards the result) followed by an instruction checking one of these flags (plus some other flags depending on which exact operator).

prissi

While most architectures have flags, it is very hard to see whether a signed or unsigned wrap around is needed, especially when number constants like€ '3' are involed which could be either signed or usinged.

On type sizes, my favorite was a TI compiler for a DSP which had size of char = foat = int = 40 bit, and sizeof(char[4])==sizeof(char) ...

ADA had unitary numbers and overflow checks. (That was the reason why some sattelite was lost some years ago, the communication failed between the ADA rocket controller code and the newer C code, because never had negative number offsets been used in their tests ...)

sdog

Oh dear, int8_t a = 127; a + 1; a +1; does a wrap around to negative as well.

With testing and trapping so dangerous, the only safe way seems to be to avoid any integer type for general calculations, and resort to libraries providing, for example, arbitrary precision integers, or other forms of safe integers. On the other hand, this brings the huge drawback of introducing dependencies, beyond standard library. Which in turn limits the modularity of functions written that way.

That is exacerbated by machine dependent definitions of int:

#include <stdint.h>
[cling]$ sizeof(int)
(unsigned long) 4
[cling]$ sizeof(int64_t)
(unsigned long) 8
[cling]$ sizeof(int32_t)
(unsigned long) 4

Cling is clang, for gcc int is also a 4 byte type.
But it doesn't really matter what actual compilers do, the C standard seems to define (i've not checked it) that int[/i] has at least the size of short int.

[cling]$ sizeof(short int)
(unsigned long) 2

That way one has to assume that int and uint might be only 16 bit integers.

[cling]$ #include <cmath>
[cling]$ std::pow(2,(2*8-1))-1
(double) 32767.0


concluding musing: Overall, quite interesting. C++ learing is indeed quite different from other languages, and seems pointless without also learning more about the machine. I wonder how experienced one has to be to actually write productive code, ie, without risking to cause a bug.

By the way, why #include <stdint.h>? The std seems to stipulate it would be a part of standard library, but the way it is included doesn't. A brief search based on the assumption that <stdint.h> is obsolete didn't get me any further either.



Quote from: DrSuperGod
For loop is nothing more than a neater way of showing a while loop.

[...]
// while loop
int i = 0;
while (i < x) {
    something(i);
    i+= 1;
}


I suspected as much, but thought that both, for and while, might be syntactic sugar for something more fundamental.

I see two problems with this loop construction. The first one is that the counter int i is not restricted in scope to the actual loop. That can be dangerous as the next loop might start at an advanced point, unless re-initialised. The other is that the iteration is an explicit statement that could easily be overlooked or forgotten. The question, is the while loop deprecated for such use, and ought only be used when looping through a construct that does not need a counter variable?

However, the first loop example in the textbook I'm using, does use a while loop exactly for that, with a global variable. (And even a very unsafe termination condition, (i != n), rather than (i < n). (second thought, with integer wrap-around, that doesn't seem to be safe any more.)

ps.: nice, there is a much more sensible looking iteration i += 1; in this.



@prissi
It really sounded like inviting mistakes to happen. At least at compile time, the compiler could check if uint and int are in the same arithmetic calculations and exit with an error. I wonder why such is not part of the standard. I mean, when it is needed, one can always cast from int to uint, then be aware of the danger.

Ters

Quote from: sdog on December 22, 2016, 09:07:59 PM
By the way, why #include <stdint.h>? The std seems to stipulate it would be a part of standard library, but the way it is included doesn't. A brief search based on the assumption that <stdint.h> is obsolete didn't get me any further either.

#include <stdint.h> is perfectly normal way of including stuff in C. C++ on the other hand likes to take the standard headers inherited from C, wrap them in another header without the extension, but with a C prefix, and which also puts the stuff inside in the std namespace. However, since C++ is (almost) a superset of C, (almost) everything that is valid C must also be valid C++, including including stuff just like in C. stdint.h is not obsolete as far as I can tell, but C++ prefers that you refer to it as cstdint.

Quote from: sdog on December 22, 2016, 09:07:59 PM
With testing and trapping so dangerous, the only safe way seems to be to avoid any integer type for general calculations, and resort to libraries providing, for example, arbitrary precision integers, or other forms of safe integers. On the other hand, this brings the huge drawback of introducing dependencies, beyond standard library. Which in turn limits the modularity of functions written that way.

I think the general idea when using the C language is that if you unintentionally overflow a datatype, you were screwed long before then. Trying to handle it there and then at runtime is too late anyway.

With C++, you can at least create classes that thanks to operator overloading, behave just like built-in data types. And this idea extends itself to making the data type include the unit (seconds, meters, kilograms, Newtons, etc.). This can be done by either having a data type for each unit, or by having an extra field next to the value which somehow describe the unit. Either way, you can avoid nasty bugs where units are mixed, such as adding a m/s value with a m/sw value. It can also avoid mixing SI and Imperial units, but I think that problem is more likely to occur in the communication between components that are made by different teams using different tools, including potentially programming language and almost certainly supporting libraries, meaning this kind of metadata gets lost.

But the more important thing to have in mind is using the right tool for the job. If arbitrary precision arithmetic and an extremely low fault tolerance is critical, C is not the right tool. C++ can be somewhat better, mostly due to operator overloading. There is a reason we have kept on inventing programming languages after C: It is not perfect for everything. It was after all made for writing UNIX. When writing an operating system, integer overflow is rather irrelevant. Apart from 0, the value you don't want to pass is most likely not related to the size of the data type. Nor do they care for arbitrary precision.

However, being the language of the mother of all major modern operating systems, it is also a language, perhaps the only language beyond vendor specific assembly code, that is almost guaranteed to exist on every platform, which probably has contributed a lot to its popularity.

DrSuperGood

Quote
On type sizes, my favorite was a TI compiler for a DSP which had size of char = foat = int = 40 bit, and sizeof(char[4])==sizeof(char) ...
Either the minimum word size was 40bit and packing was used (so all types were at least 40 bit in size), or sizeof was bugged (which I have had in a compiler for an 8 bit pic processor)...

Apparently systems do exist where a byte is not defined as an octet. Honestly I have no idea how one would even go about writing portable code for such systems... As such I just ignore their existence when programming as I doubt its worth the time or effort caring.

Ters

Quote from: DrSuperGood on December 23, 2016, 01:37:47 AM
Urg good point that I completely overlooked... I guess there are separate signed addition/subtraction instructions just one can emulate them efficiently with unsigned addition/subtraction as long as one is careful (know for sure what values are positive or negative) or occasionally tests the MSB (set for negative numbers).

Huh?

Combuijs

It is of course dependent on personal or company programming style, but I use a for loop when I expect to run the loop for all the elements in its domain and a while loop when I expect this is not the case.

For example, say k is an array of n integers then I would use a for loop for getting the sum of all integers:


int sum = 0;
for (int i = 0; i < n; i++)
{
   sum = sum + k[i];
}


On the other hand, looking for a value m in that array I would do


bool found = FALSE;
int i = 0;
while (i < n && !found)
{
  if (k[i] == m)
  {
      found = TRUE;
   }
   else
   {
      i++;
   }
}


Note that after the while loop you still have access to the i variable to use for example the index.

But as said, that is all personal style, you can do the first one in a while loop and the second one in a for loop.

As for 0, 1, TRUE and FALSE, I can never remember which one is which, so I am always in doubt when I see


if (pointer)
{
}


I really would prefer


if (pointer == null)
{
}


or


if (pointer != null)
{
}


for readability.

In C# I am very happy I can do things like


if (found == false)
{
}


instead of


if (!found)
{
}


as you tend to overlook the ! character when the expression is a bit larger.

And I know that


return a > 0 ? 1 : 0;


is much more concise than


if  (a > 0)
{
   return 1;
}
return 0;


but I find the latter more readable as I always forget which one comes first in the ? operation. Yes, I know, first true then false, but 0 (false) < 1 (true), which always gets me confused.
Bob Marley: No woman, no cry

Programmer: No user, no bugs



sdog

@prissi, DSG
Quote
QuoteOn type sizes, my favorite was a TI compiler for a DSP which had size of char = foat = int = 40 bit, and sizeof(char[4])==sizeof(char) ...
Either the minimum word size was 40bit and packing was used (so all types were at least 40 bit in size), or sizeof was bugged (which I have had in a compiler for an 8 bit pic processor)...
There used to be an old 40 bit improved precision float, with 32 bit mantissa, 7 bit exponent. I remember seeing references or traces of it in some older numerical library routines. One cannot do much with single precision. Machine epsilon = 2**(1-p), where p are the bits of the mantissa. For single p=28 and thus ε_32 = 2**-24 = 1e-7,
ε_40 = 2**-32 = 5e-10, whereas double has a 10 bit mantissa leaving ε_64 = 2**-53 = 2e-16.

Combining regular 32 bit and 8 bit signed integer registers seems not that absurd, most operations can be done on exponent and mantissa independently. However, did they even have 32 bit wide registers in the sixties or seventies when this was done? I suppose ALUs could be linked arbitrarily up to the desired width.


Quote from: Combuijs on December 23, 2016, 10:21:19 AM
It is of course dependent on personal or company programming style, ...

[...]


return a > 0 ? 1 : 0;



Thank you for the coding style examples. I find them much more readable and clear indeed.

For me the ternary operator comes close to `tar`. I've used it many dozens of times, and always had to look up the syntax.

prissi

The int size is the size of the natural int of a CPU. Hence on the C64 C compiler (at least one of them) sizeof(int)=1

And sorry, the TI processor DSP had indeed siezof(long)=sizeof(int)=sizeof(double)=4 and sizeof(float)=sizeof(char)=2 but with the byte size of 10 bit. As long as you use the prefined portab.h constants MAXINT you are safe with any architecture (ok, some magic with shifts would not work as expected). Un the other hand that DSP had 40 kB (in units of 10 bit for a Byte) RAM ad was mainly geared towards 32/8 floating points. Since it was a DSP, it came obviously without any stdlib.h support apart from printf to the debug serial port ...

Also C compiler had a very long time their own interpretations of the standard. Until MSVC6 the following code was illegal

for(int i=1; i<10;i++) { do something }
for(int i=1; i<100;i++) { do another thing }

because it expanded it internally to that while construct

int i=1;
while( i<10 ) { something; i++; }
int i=1;
while( i<100 ) { another thing; i++; }

but twice declarations of i are not allowed in the same block ...

Ters

Quote from: prissi on December 23, 2016, 10:40:17 PM
The int size is the size of the natural int of a CPU.

Except for x86-64. When x86 went from 16-bit to 32-bit, int grew from 16-bit to 32-bit, although the definition of word stayed at 16-bit. When 64-bit x86 came along, int stayed 32-bit. Whether any data types changed size at all, depended on the OS if I remember correctly.

sdog

Some good reading I stumbled upon:

On the question: When to pass by value and when to pass by reference?

value semantics vs reference semantics:Some good reading I stumbled upon:

On the question: When to pass by value and when to pass by reference?

value semantics vs reference semantics:
https://akrzemi1.wordpress.com/2012/02/03/value-semantics/

when to pass values, for speed
http://web.archive.org/web/20140116234633/http://cpp-next.com/archive/2009/08/want-speed-pass-by-value/

Conclusion from a brief overview of advice on typical sites: even experienced developers seem to be uncertain about the best practices. With a tendency of, when in doubt pass references.

Related to this question is why often it is not left to the compiler to decide if a data structure has to be copied or referenced, which the compilers are in principle able to do. It seems it is implicated that these were not capable of doing so, but then, wouldn't it be better practice to structure ones functions in a way that enables the compiler to do its job?

Another related question: why in code I've read are there so many passes by mutable rather than constant references, even when it seems not to be necessary. pointless without examples

The text book and many tutorials I'm reading are from the distant past, ie before 11, and also seem to take parallel computing not very serious. It seems that a rule of thumb when in doubt pass by value, as if it goes wrong it only costs performance and memory, but is safer might be more advisable these days.

Q: What are the white-space rules for reference ampersands, let a be any integer, int& b = a; or int &c = a;? No reading here, this didn't google well.


https://akrzemi1.wordpress.com/2012/02/03/value-semantics/

when to pass values, for speed
http://web.archive.org/web/20140116234633/http://cpp-next.com/archive/2009/08/want-speed-pass-by-value/

Conclusion from a brief overview of advice on typical sites: even experienced developers seem to be uncertain about the best practices. With a tendency of, when in doubt pass references.

Related to this question is why often it is not left to the compiler to decide if a data structure has to be copied or referenced, which the compilers are in principle able to do. It seems it is implicated that these were not capable of doing so, but then, wouldn't it be better practice to structure ones functions in a way that enables the compiler to do its job?

Another related question: why in code I've read are there so many passes by mutable rather than constant references, even when it seems not to be necessary. pointless without examples

The text book and many tutorials I'm reading are from the distant past, ie before 11, and also seem to take parallel computing not very serious. It seems that a rule of thumb when in doubt pass by value, as if it goes wrong it only costs performance and memory, but is safer might be more advisable these days.

prissi

Quote(call by value/reference) Related to this question is why often it is not left to the compiler to decide if a data structure has to be copied or referenced, which the compilers are in principle able to do. It seems it is implicated that these were not capable of doing so, but then, wouldn't it be better practice to structure ones functions in a way that enables the compiler to do its job?

I am not sure a compiler is capable of doing this. Assuming the structure is then passed to a library. Should the change be reflected then on caller? I would have difficulties to guess an answer then, but I am only human ...

Ters

Quote from: sdog on December 30, 2016, 07:10:55 PM
Related to this question is why often it is not left to the compiler to decide if a data structure has to be copied or referenced, which the compilers are in principle able to do. It seems it is implicated that these were not capable of doing so, but then, wouldn't it be better practice to structure ones functions in a way that enables the compiler to do its job?

In C, there is no such concept as pass-by-reference. Pointers are a data type of their own. Since C is also meant to be a thin layer above machine code, the compiler isn't really supposed to do things behind the programmers back. It would also be rather disastrous if what the programmer thought was a local copy was the same instance as in the caller, and ended up corrupting it.

C++ has true references, but as far as I know, it doesn't automatically turn things into pass-by-reference, it turns pass-by-const-reference into pass-by-value when more suitable. Generally, this is when the parameter is a const reference to a primitive. Why would one create parameters that are const references to primitives when that is obviously inefficient? Because it might be a templated function that doesn't know what type the parameter is when it was written, so the parameter is written as a const reference to avoid expensive copying if the type turns out to be a big struct. The compiler will optimize this to a pass-by-value when that is cheaper.

When the optimizer is enabled, stuff like inlining can eliminate pass-by-value completely, since nothing is passed at all. I might even do crazy stuff like turn a pass-by-value into pass-by-const-reference for all I know, but it can only do so if you do not modify the value.

And any such tinkering requires that the caller and callee are in the same translation unit. Otherwise, you might end up with the caller passing by value and the callee thinking it was passed by reference, or vice versa. Remember, it might not be the same compiler (at least for C). They might have been compiled at different times by different people.

sdog

QuoteWhen the optimizer is enabled, stuff like inlining can eliminate pass-by-value completely, since nothing is passed at all. I might even do crazy stuff like turn a pass-by-value into pass-by-const-reference for all I know, but it can only do so if you do not modify the value

That is what I referred to. Inlining is done already at -o1 for GCC. That in conjunction with various copy propagation methods ought to do the job.

The other optimisation is copy elision, which is on by default in GCC.

Isn't the const& just an instruction to the compiler that the value isn't changed in this scope, ie it doesn't matter if it's a constant reference or any other pointer if the compiler can establish that it is not meddled with? Which it was certain of in the first place to do copy elision. Why then change specifically into a pass-by-const-reference, as you mentioned?

Ters

Quote from: sdog on December 31, 2016, 01:21:44 AM
Isn't the const& just an instruction to the compiler that the value isn't changed in this scope, ie it doesn't matter if it's a constant reference or any other pointer if the compiler can establish that it is not meddled with? Which it was certain of in the first place to do copy elision. Why then change specifically into a pass-by-const-reference, as you mentioned?

I think it is more of an instruction to the compiler to forbid you from changing the value. (const also predates const&, which affects how the latter works.) Of course, that means that the compiler can make some assumptions. However, the programmer can pull the rug out from underneath the compiler by casting away the const-ness and changing the value nonetheless. Even without casting away the const-ness, the value can change, so I'm not sure how many assumptions the compiler can make.

int a = 1;
const int &b = a;
a = 2;
// b has now changed value

However, values changing outside the "regular" program flow is what the volatile keyword is for. There are also some aliasing rules, which I know little about, having only seen the compiler complain about them being violated a few times, and I think that has been when compiling code I have nothing to do with.

I do not know of any cases where pass-by-value is turned into pass-by-const-reference, I just can't rule it out. I do however know of one case that could be considered pass-by-value being turned into a pass-by-reference (note no const), and that is for the return value. If the return value is too big to fit in a register, the caller will allocate memory for it and pass a reference to the callee. So that

big_struct func() {
  big_struct retval;
  ...
  return retval;
}

is implemented more or less identically as

void func(big_struct &retval) {
  ...
}

in machine code. This happens even when not optimizing. (Optimizing may modify it further. Maybe even put it back into a register if the optimizer figures out it big_struct can be turned into a vector value, but that seems unlikely.)

DrSuperGood

The const keyword tells the compiler that the value cannot change. This has technical implications with regard to the compiled code.

Quote
However, the programmer can pull the rug out from underneath the compiler by casting away the const-ness and changing the value nonetheless. Even without casting away the const-ness, the value can change, so I'm not sure how many assumptions the compiler can make.
No the programmer cannot disregard const, at least sanely. Doing so can cause undefined and compiler specific behaviour to occur. In a lot of cases it will even cause a crash.

The reason const cast exists is as a hacky work around for improperly designed APIs when you are absolutely certain that a const value is just a constant view of a dynamic value.

In some compilers non dynamic const values go to ROM storage which does not even share the same address range as the dynamic RAM storage. No instructions exist to modify such constant data (as it is ROM...) and using a const cast will either throw a compile time warning or might result in some arbitrary address inside RAM being corrupted. In virtual memory mapped computer systems, such as most modern operating systems, compile time const values might be placed in read only pages so attempting to modify a value will throw a security exception and crash the application.

When something is marked as const it is not a suggestion that it should not be modified, it is more a suggestion it cannot be modified. Disregarding this suggestion is not recommended.

Ters

When has pulling the rug out from underneath anything ever been a sane thing to do? I thought that was implicitly evil.

sdog

I think some clarification with regard to the const matter would be good. It seems that intended and perceived meaning do not converge yet.

Initially we were talking specifically about constant references. For example let N be a field that includes all valid values for int and y in N, and int a = x; const int &b;. I understand this as a reference where the value of a is mutable but this value can be accessed read-only through b; ie a = y; is possible while b = y; is not, for all y in N\x.
Now, I understand the const attibute in the context of a & reference is to inform the compiler that the latter operation, b += y;, is forbidden. However, in the binary there is no difference if there is a pointer to the address denoted where the value of a is stored, a const reference, a non-const reference or a itself. Is this so far correct?


I should like to discuss this quote in that context, I number sentences for later reference:
Quote
(1) I think it is more of an instruction to the compiler to forbid you from changing the value. (2)(const also predates const&, which affects how the latter works.) (3) However, the programmer can pull the rug out from underneath the compiler by casting away the const-ness and changing the value nonetheless. (4) Even without casting away the const-ness, the value can change, so I'm not sure how many assumptions the compiler can make.
[...]
(5) I do not know of any cases where pass-by-value is turned into pass-by-const-reference, I just can't rule it out.
(1) appears to be consistent with what I think. However, (5) is slightly contradictory, why would the compiler, that creates the machine code bother with instructions to itself? That gets me to the second point of (5) when the compiler can establish that there are no writes to the data structure that would be copied due to a pass-by-value, it will not duplicated the data structure but read from the original address (as if it were passed as pass by const ref). That seems to be for easy cases default behaviour for most compilers, eg one has to turn it off in gcc with several -fno switches. A similar example is return value optimization you mentioned.

Is (2) due to some sloppy naming const int c = 0; appears an entirely different concept than const int& .... The former is what is understood as constant, while the other could be more aptly named read-only reference?

(3)(4) the example that follows doesn't pull the rug out, that seems to be exactly the way a reference ought to work, doesn't it? If the original value changes, the reference ought to reflect this new state, if one needs the original value one would have to copy it, whether as a variable or a constant, in any case without the &. I cannot quite follow why this would be pulling the rug. However, I do not know the const cast you mentioned, and seemed to have in mind. Pulling rugs is bad, and nothing I have to be concerned much at this stage, I suppose.

Example: let arb1, arb2 be arbitrary data structure:

arb2 f (arb1 x)
{
    arb2 y;
    ...   
    //instructions where the state of x is not changed
    ...
    return y;
}
arb2 a = f(b)
[...]


Even if f(x) would not be inlined, the compiler would not duplicate any data structures, a and b would be the only ones addressed.

void f (arb1 &x, arb2 *y)
{
  ...
  // same as above
}
[...]
f( b, a )

would be no more effective, while it is much less readable, cannot be made pure, and offers less chance for optimisation, and is at risk of breaking paralellisation*. Unless the compiler cannot establish that x remmains unperturbed, and does indeed copy

*I've stumbled over 'aliasing' which seems to mean something in this context, I've not read up on it. C++ has a tendency to give concepts often very confusing names, that mean entirely different things in any non C/C++ context. Vector is a particularily nasty example, which is very much unlike a vector (elements can be removed and added) but more like an (effective implementation) of a list. One can only guess that these terms were used because apt terms were already used for other stuff, like ineffective implementations of lists (std::list).



Someone's loud snoring woke me up (05:25 now) and it occurred to me that you might have meant: "what if (a) a concurrent process changes the value while the function is executed or (b) the function itself changes that value somehow. I'll return to that later today.

Ters

First of all const int &a is a reference to a constrant integer. In C++, the references themselves are always constant, in the sense that you can never change the reference to reference something else. This is unlike pointers, which can be pointed elsewhere, unless the pointer itself is constant (int * const a), as well as references in Java (unless final).

(5) The compiler does not instruct itself (nor any other compilers). I don't know where you get that from. The compiler instance compiling the callee can know that the function does not modify the value, but the compiler instance compiling the caller does not necessarily know that, nor can the compiler compiling the callee know if there are other callers elsewhere (unless the function is static). Therefore, the parameters of the function can not be modified from pass-by-value to pass-by-reference. If it did callers that know nothing better than to pass by value, would crash.

As for pulling the rug, that had to do with doing naughty things like

const int c = 1;
int &d = const_cast<int &>(c);
d = 2;
// c may or may not equal 2 now, depending on how the compiler implemented this in machine code

not anything like what you show in (3) and (4).

sdog

Quote from: Ters on January 01, 2017, 10:42:07 AM
First of all const int &a is a reference to a constrant integer. In C++, the references themselves are always constant, in the sense that you can never change the reference to reference something else. This is unlike pointers, which can be pointed elsewhere, unless the pointer itself is constant (int * const a), as well as references in Java (unless final).
The highlighted text would mean I missed the point entirely. Let's test if I confused the syntax and int const& is different from const int&:


[cling]$ int a = 0;
[cling]$ const int& b = a;
[cling]$ int const& c = a;
[cling]$ b
(const int) 0
[cling]$ c
(const int) 0
[cling]$ a = 1
(int) 1
[cling]$ b
(const int) 1
[cling]$ c
(const int) 1

Both b and c seem to reference a, where a is a mutable integer. Checking the actual references:

[cling]$ &a
(int *) 0x7f3614297000
[cling]$ &c
(const int *) 0x7f3614297000


They all go to the same memory address. I suppose a may be changed without pulling the rug? Is this just a misunderstanding, lack of understanding of the terminology, or do I get something fundamentally wrong.

The second part of the quote is clear, here's a test:


// reference without const attribute
[cling]$ int &d =a
(int) 5
[cling]$ &d
(int *) 0x7f3614297000

// and a pointer
[cling]$ int * p = &a
(int *) 0x7f3614297000

[cling]$ int other_a
(int) 0
[cling]$ p = &other_a
(int *) 0x7f3614297030

// attepmt to re-asign non-const reference
[cling]$ &d = &a;
input_line_90:2:5: error: expression is not assignable
&d = &a;

// and const reference
[cling]$ &c = &a
input_line_92:2:5: error: expression is not assignable
&c = &a



Quote
(5) The compiler does not instruct itself (nor any other compilers). I don't know where you get that from.
Well, that seemed implicit from what you wrote above, therefore I asked again, and it cleared as a misunderstanding. However, as above shows there is more to it.


Quote
The compiler instance compiling the callee can know that the function does not modify the value, but the compiler instance compiling the caller does not necessarily know that, nor can the compiler compiling the callee know if there are other callers elsewhere (unless the function is static).
I think here we are touching what I thought of last night and added to the previous message. Let f(int x){...} a function and int a=0; [...] int b = f(a); the function call. Out of my head I can see two cases where the value of a changes while the function is processed: (i) a concurrent process changes a while f(int x) is running. (ii) the function has side effects and changes the global variable a while working with its copy x. Are there more conceivable cases (that leave the rug in place)?

In case (i) the compiler must duplicate a to a and x. If one were to pass by a reference or a pointer one would have unpredictable outcome of the function f as its arguments might change at any time during its runtime.

Case (ii) seems like absolutely terrible coding, and I think that is not just my preference for purity and functional approach. This would also break any other basic optimisation like copy propagation and inlining.

In both cases, (i) and (ii) pass by value is safer, while pass by reference might cause havoc. The compiler would err on the safer side and duplicate data structures.

I've not thought about it earlier, as I simply assumed that this would happen. That was an oversight.

The conclusion seems to be similar though:

Quote
Therefore, the parameters of the function can not be modified from pass-by-value to pass-by-reference. If it did callers that know nothing better than to pass by value, would crash.

As for pulling the rug, that had to do with doing naughty things like

const int c = 1;
int &d = const_cast<int &>(c);
d = 2;
// c may or may not equal 2 now, depending on how the compiler implemented this in machine code

not anything like what you show in (3) and (4).

const_cast from const to mutable (volatile?) seems to be a mindbogglingly awful idea. Why are such things in a standard? It cannot seriously be to enable people to do hackish workarounds to broken APIs Dr Super Good mentioned?


This discussion brought me to two new questions:

If the compiler does cannot properly optimise code, it seems preferable to change the code in such a fashion that the compiler can optimise, rather than optimising it manually.

How to determine (without too much effort) if the compiler optimises the code I'm currently working on? Profiling? For something that specific it seems rather tedious. Purposefully obstructing compiler optimisation attempts and then profiling this as a control sample?

Ters

Quote from: sdog on January 01, 2017, 01:34:41 PM
The highlighted text would mean I missed the point entirely. Let's test if I confused the syntax and int const& is different from const int&:

Both are references to constant integers. With pointers you can have pointers to constants (const int * and, I think, int const *), constant pointers to non-constants (int * const) and constant pointers to constants (const int * const). (This can get really confusing once you deal with pointers to pointers.) References are always constant, that is they can never be changed to refer to something else (except through dirty hacks), but can refer to values that are either constant or not.

Quote from: sdog on January 01, 2017, 01:34:41 PM
They all go to the same memory address. I suppose a may be changed without pulling the rug?

It can be a bit of rug-pulling. It might be mostly on yourself (and possible co-workers), though.

Quote from: sdog on January 01, 2017, 01:34:41 PM
The compiler would err on the safer side and duplicate data structures.

Duplicating data structures is not necessarily safer (this goes for all programming languages with mutable data). In C++, it might even be disallowed if the coder has disabled the copy constructor and assignment operator for the data structure.

Quote from: sdog on January 01, 2017, 01:34:41 PM
const_cast from const to mutable (volatile?) seems to be a mindbogglingly awful idea. Why are such things in a standard? It cannot seriously be to enable people to do hackish workarounds to broken APIs Dr Super Good mentioned?

Yes they are. In fact, they might even be there just for the hacks, broken APIs or not. C is so low level that programmers could get around constness in multiple (I can think of two off the bat, three if simple C casts are considered non-standard) ways anyway, so they might have made a standard way of doing it just so that it is easy to find the places such naughty things are done. C is itself not quite const-correct. Maybe it didn't have the concept originally. strchr is an example in pure C (C++ gets it right, since it supports overloading).

volatile is for various kinds concurrency (between threads, or between software and hardware), and not strictly speaking as the opposite of constant. I can imagine something being both const and volatile, if it is a memory-mapped read-only input device, but I do not know if C actually allows it.

Quote from: sdog on January 01, 2017, 01:34:41 PM
How to determine (without too much effort) if the compiler optimises the code I'm currently working on?

The compiler will only optimize if you tell it to. It will also only do the kinds of optimizations you tell it to. Whether it actually determines to apply a particular optimization can only be found out by looking at the assembly as far as I know.