Started by ceeac, February 24, 2020, 03:35:34 PM
0 Members and 1 Guest are viewing this topic.
Quote from: prissi on February 25, 2020, 01:42:55 AMremove all HJ Mathaner copyright messages (which are not appropriate since OS from 2007 anyway.
Quote from: Ters on February 24, 2020, 06:29:49 PMDo the source files need to be anything beyond ASCII in the first place?
Quote from: Freahk on February 24, 2020, 06:48:35 PMEven further, some old comments are still written in German, a few containing umlauts.
Quote from: DrSuperGood on February 25, 2020, 05:18:50 PMUnicode supports German characters so why remove them?
Quote from: prissi on February 26, 2020, 12:29:49 PMActually, C keywords and everything else apart from strings and comments must be ASCII, as far as I know. Strings being UTF-8 (or any other codepage) might not work on a random system. (However, since most "rtandom system" are using GCC to compile, this issue has been solved by the GCC monopoly.)
Quote from: Ters on February 26, 2020, 06:15:11 PMI don't know how GCC works on a system without UTF-8 locales, but one can clearly not just expect UTF-8 to work. Pure ASCII will work whether your system is set up for ASCII, ISO-8859-x, Windows-125x, and (for the most part) Shift JIS and possibly other Asian encodings. Unless you really need to encode a lot of non-ASCII characters in the code, it is probably the safest to use just ASCII. For files read by our applications, we have control over how the contents are interpreted, so they can (and should) be UTF-8.
Quote from: DrSuperGood on February 26, 2020, 07:31:20 PMOne could argue such systems are incorrectly configured to begin with. Possibly for legacy reasons.
Quote from: ceeac on February 27, 2020, 07:38:49 AMthe scrolltext
Quote from: prissi on February 27, 2020, 06:42:45 AMSo UTF-8 is by definition not ASCII
QuoteIt was designed for backward compatibility with ASCII. Code points with lower numerical values, which tend to occur more frequently, are encoded using fewer bytes. The first 128 characters of Unicode, which correspond one-to-one with ASCII, are encoded using a single byte with the same binary value as ASCII, so that valid ASCII text is valid UTF-8-encoded Unicode as well. Since ASCII bytes do not occur when encoding non-ASCII code points into UTF-8, UTF-8 is safe to use within most programming and document languages that interpret certain ASCII characters in a special way, such as "/" (slash) in filenames, "\" (backslash) in escape sequences, and "%" in printf.
Quote from: DrSuperGood on February 26, 2020, 07:31:20 PMUTF-8 is ASCII compatible.
Quote from: DrSuperGood on February 27, 2020, 06:51:18 PMUTF-8 was designed to be backward compatible with ASCII. In worst case some characters would appear as nonsense glyphs.
Quote from: Ters on February 27, 2020, 07:16:48 PMA program made for ASCII might use a 128 element array to look up glyphs. Once fed UTF-8 containing non-ASCII characters, it would end up with out-of-bounds access. (And these programs would likely not do error checking. Every byte counts when you only have 256 kB RAM.)
Quote from: Ters on February 27, 2020, 07:16:48 PMI'm just concerned that the actual result of something will be different depending on whether the code is read as UTF-8 or something else. The only thing that I can think of at the moment are wide character and string literals, but there may be something else. UTF-8 in comments should probably work fine on a 8-bit extended ASCII system, since the compiler doesn't care about the contents. The other way is more uncertain, although a proper UTF-8 implementation should be able to synchronize back. I'm not sure if UTF-8 work as well with other multi-byte encodings.
Quote from: prissi on February 27, 2020, 12:57:28 PMRemove the line with the copyright message and the following line with *
Quote from: DrSuperGood on February 27, 2020, 10:50:39 PMSuch programs would be considered poorly made in this day and age where even spectre and meltdown are a concern.
Quote from: DrSuperGood on February 27, 2020, 10:50:39 PMCurrently this is very much looking for problems that might not be there.
Quote from: DrSuperGood on February 27, 2020, 10:50:39 PMModern C++ defines the handling of String literals specifically with regard to Unicode. Some time in the future one can swap over to using those once compiler support is main stream or at least required. Modern GCC builds should support it.