Implicit assuptions (was: Re: Some fun with -O2)

classic Classic list List threaded Threaded
22 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Implicit assuptions (was: Re: Some fun with -O2)

Walter von Entferndt-2
At Donnerstag, 14. Januar 2021, 13:00:00 CET, Konstantin Belousov wrote:
> The time_t type is signed, then the loop
>   for (time_t_max = 1; 0 < time_t_max; time_t_max *= 2)
>        continue;
>
> intent is to get signed overflow, which is UB.  Modern compilers prefer to
> shoot into your foot instead of following common sense.
>
The next line in the program is: time_t_max--;

The problem here is one of the most common in programming: to rely on an /
implicit assumption/ (e.g. how the compiler (or the CPU,...) handles some
expression).  Obviously, the above loop condition (0 < time_t_max) is
(mathematically) always true and thus can be optimized right away to an
infinite loop.  Isn't that common sense, too? How should the compiler (or the
CPU) know that this was /not/ intended?

> Workaround is to add -fwrapv compiler switch.
>
And the correct solution is: Do not rely on implicit assumptions.  Instead,
write down /explicitely/ what you intend to do or get as result.  If that's
not possible (or too complicated), at least add a comment explaining your
intention.

Since the behaviour on signed overflow is undefined in C, the above statement
must be rewritten completely.  For the case above, it's obvious (also from the
name of the variable) that the author wants to have time_t_max  = the largest
possible positive number of type time_t (all bits 1 except the sign bit).  OK,
then write that instead:

[add #include <stdlib.h>]
add #include <limits.h>
change static time_t time_t_max;
to static const time_t time_t_max = LONG_MAX;
[because time_t equals long int]
del for (time_t_max = 1; 0 < time_t_max; time_t_max *= 2)
del continue;
del time_t_max--;

You may also
add #include <stdio.h>
add printf ("time_t_max = %ld\n", time_t_max);

in main() to verify that's giving you what you want.
--
=|o) "Stell' Dir vor es geht und keiner kriegt's hin." (Wolfgang Neuss)
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Implicit assuptions (was: Re: Some fun with -O2)

freebsd-hackers mailing list


On 2021-Jan-14, at 09:44, Walter von Entferndt <[hidden email]> wrote:

> At Donnerstag, 14. Januar 2021, 13:00:00 CET, Konstantin Belousov wrote:
>> The time_t type is signed, then the loop
>>  for (time_t_max = 1; 0 < time_t_max; time_t_max *= 2)
>>       continue;
>>
>> intent is to get signed overflow, which is UB.  Modern compilers prefer to
>> shoot into your foot instead of following common sense.
>>
> The next line in the program is: time_t_max--;
>
> The problem here is one of the most common in programming: to rely on an /
> implicit assumption/ (e.g. how the compiler (or the CPU,...) handles some
> expression).  Obviously, the above loop condition (0 < time_t_max) is
> (mathematically) always true and thus can be optimized right away to an
> infinite loop.  Isn't that common sense, too? How should the compiler (or the
> CPU) know that this was /not/ intended?
>
>> Workaround is to add -fwrapv compiler switch.
>>
> And the correct solution is: Do not rely on implicit assumptions.  Instead,
> write down /explicitely/ what you intend to do or get as result.  If that's
> not possible (or too complicated), at least add a comment explaining your
> intention.
>
> Since the behaviour on signed overflow is undefined in C, the above statement
> must be rewritten completely.  For the case above, it's obvious (also from the
> name of the variable) that the author wants to have time_t_max  = the largest
> possible positive number of type time_t (all bits 1 except the sign bit).  OK,
> then write that instead:
>
> [add #include <stdlib.h>]
> add #include <limits.h>
> change static time_t time_t_max;
> to static const time_t time_t_max = LONG_MAX;
> [because time_t equals long int]

Not on FreeBSD for 32-bit powerpc it is not: time_t
is 64 bits wide there but long int is not:

From /usr/include/machine/_types.h :

typedef __int64_t       __time_t;               /* time()... */

From /usr/include/sys/types.h :

typedef    __time_t        time_t;


> del for (time_t_max = 1; 0 < time_t_max; time_t_max *= 2)
> del continue;
> del time_t_max--;
>
> You may also
> add #include <stdio.h>
> add printf ("time_t_max = %ld\n", time_t_max);
>
> in main() to verify that's giving you what you want.




===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Implicit assumptions (was: Re: Some fun with -O2)

Walter von Entferndt-2
At Donnerstag, 14. Januar 2021, 20:00:31 CET, Mark Millard wrote:

> Not on FreeBSD for 32-bit powerpc it is not: time_t
> is 64 bits wide there but long int is not:
>
> From /usr/include/machine/_types.h :
>
> typedef __int64_t       __time_t;               /* time()... */
>
> From /usr/include/sys/types.h :
>
> typedef    __time_t        time_t;
>
Ha! Thx... Ok, then we have to use the *sizeof* operator and if you thought
there's one holy constant in computer science - that a byte has 8 bits, and we
can use that as an ... /implicit assumption/ ... - you might be surprised that
it's declared explicitely in
sys/param.h:#define     NBBY    8               /* number of bits in a byte */
and you can find the number 8 (and 16, 24, 32, 48,...) all over other header
files without using that definition (NBBY) so these are strictly speaking all
wrong... ;)  So now we have (errorneous!)

[delete #include <limits.h>]
#include <sys/param.h>

time_t time_t_max;
...
for (i = time_t_max = 1; i < NBBY*sizeof time_t_max; i++)
        time_t_max *= 2;
time_t_max--;

which gives the intended result but is strictly speaking wrong, because it you
don't do the decrement, the value is negative, and the decrement of a negative
number should be negative, too.  I.e. this only gives what we want because we
make use of an undefined behaviour.  So let's do better (and portable):

#include <sys/param.h>
signed_t signed_max, t1;

for (int i = signed_max = 1; i < NBBY*sizeof signed_max - 1; i++) {
          signed_max *= 2; // eq: signed_max <<= 1;
          t1 = signed_max - 1;
  }
  signed_max += t1; // eq: signed_max |= t1;
/* QED
 * hopefully the compiler optimizes *=2 and we're using binary numbers ;)
 */
--
=|o) "Stell' Dir vor es geht und keiner kriegt's hin." (Wolfgang Neuss)


_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Implicit assumptions (was: Re: Some fun with -O2)

Walter von Entferndt-2
In reply to this post by freebsd-hackers mailing list
Eventually we have a compile-time constant expression
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"

check_mktime.c.diff (1001 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Implicit assumptions (was: Re: Some fun with -O2)

freebsd-hackers mailing list
On 2021-Jan-14, at 19:47, Walter von Entferndt <walter.von.entferndt at posteo.net> wrote:

> Eventually we have a compile-time constant expression<check_mktime.c.diff>

FYI: C itself has, in <limits.h> , CHAR_BIT for the number of bits in a
Byte for how sizeof counts Bytes:  sizeof(char), sizeof(signed char),
and sizeof(unsigned char) are each always 1.

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Implicit assumptions (was: Re: Some fun with -O2)

Walter von Entferndt-2
At Freitag, 15. Januar 2021, 08:02:49 CET Mark Millard wrote:
> FYI: C itself has, in <limits.h> , CHAR_BIT for the number of bits in a
> Byte for how sizeof counts Bytes:  sizeof(char), sizeof(signed char),
> and sizeof(unsigned char) are each always 1.
>
No, CHAR_BIT is the #bits in a *char*, which is the (standard) datatype for
the binary representation of a character/letter/symbol in written human
language, and for small integers.  The name also suggests that, as well as the
comment in the header file.  That does not necessarily equal a "byte", which
(by commonly accepted knowledge) is the smallest addressable entity in a
computer's memory.  Of course, e.g. https://code-reference.com/c/datatypes/
char tells a *char* occupies one byte.  Sadly, AFAIK C itself does not define
what a "byte" is, although that term is mentioned many times in reference
manuals (implicit assumption).  So /theoretically/ CHAR_BIT and NBBY can
differ.  In fact, many library funtions operating on characters/letters take
an *int* instead of a *char* for performance reasons.  From https://code-reference.com/c/stdlib.h/sizeof: "the *sizeof* operator returns the number of
bytes to be reserved for a variable or a data type".  Of course, for practical
reasons, we can safely assume that a *char* will take one byte in storage
space for the foreseeable future, since the consequences of changing that
would be disastrous.
--
=|o) "Stell' Dir vor es geht und keiner kriegt's hin." (Wolfgang Neuss)


_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Implicit assumptions (was: Re: Some fun with -O2)

freebsd-hackers mailing list


On 2021-Jan-15, at 10:11, Walter von Entferndt <[hidden email]> wrote:

> At Freitag, 15. Januar 2021, 08:02:49 CET Mark Millard wrote:
>> FYI: C itself has, in <limits.h> , CHAR_BIT for the number of bits in a
>> Byte for how sizeof counts Bytes:  sizeof(char), sizeof(signed char),
>> and sizeof(unsigned char) are each always 1.
>>
> No, CHAR_BIT is the #bits in a *char*, which is the (standard) datatype for
> the binary representation of a character/letter/symbol in written human
> language, and for small integers.  The name also suggests that, as well as the
> comment in the header file.  That does not necessarily equal a "byte", which
> (by commonly accepted knowledge) is the smallest addressable entity in a
> computer's memory.  Of course, e.g. https://code-reference.com/c/datatypes/
> char tells a *char* occupies one byte.  Sadly, AFAIK C itself does not define
> what a "byte" is, although that term is mentioned many times in reference
> manuals (implicit assumption).  So /theoretically/ CHAR_BIT and NBBY can
> differ.  In fact, many library funtions operating on characters/letters take
> an *int* instead of a *char* for performance reasons.  From https://code-reference.com/c/stdlib.h/sizeof: "the *sizeof* operator returns the number of
> bytes to be reserved for a variable or a data type".  Of course, for practical
> reasons, we can safely assume that a *char* will take one byte in storage
> space for the foreseeable future, since the consequences of changing that
> would be disastrous.

Have you read a (fairly modern) C standard or its officially
published rationle? You might want to.

From the officially published C99 rationale (page labeled 11,
Terms and definitions):

QUOTE
) All objects in C must be representable as a contiguous sequence of bytes,
  each of which is at least 8 bits wide.

) A char whether signed or unsigned, occupies exactly one byte.

(Thus, for instance, on a machine with 36-bit words, a byte can be defined
to consist of 9, 12, 18, or 36 bits, these numbers being all the exact
divisors of 36 which are not less than 8.) These strictures codify the
widespread presumption that any object can be treated an an array of
characters, the size of which is given by the sizeof operator with that
object's type as its operand.
END QUOTE


===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Implicit assumptions (was: Re: Some fun with -O2)

freebsd-hackers mailing list
On 2021-Jan-15, at 10:35, Mark Millard <marklmi at yahoo.com> wrote:

> On 2021-Jan-15, at 10:11, Walter von Entferndt <[hidden email]> wrote:
>
>> At Freitag, 15. Januar 2021, 08:02:49 CET Mark Millard wrote:
>>> FYI: C itself has, in <limits.h> , CHAR_BIT for the number of bits in a
>>> Byte for how sizeof counts Bytes:  sizeof(char), sizeof(signed char),
>>> and sizeof(unsigned char) are each always 1.
>>>
>> No, CHAR_BIT is the #bits in a *char*, which is the (standard) datatype for
>> the binary representation of a character/letter/symbol in written human
>> language, and for small integers.  The name also suggests that, as well as the
>> comment in the header file.  That does not necessarily equal a "byte", which
>> (by commonly accepted knowledge) is the smallest addressable entity in a
>> computer's memory.  Of course, e.g. https://code-reference.com/c/datatypes/
>> char tells a *char* occupies one byte.  Sadly, AFAIK C itself does not define
>> what a "byte" is, although that term is mentioned many times in reference
>> manuals (implicit assumption).  So /theoretically/ CHAR_BIT and NBBY can
>> differ.  In fact, many library funtions operating on characters/letters take
>> an *int* instead of a *char* for performance reasons.  From https://code-reference.com/c/stdlib.h/sizeof: "the *sizeof* operator returns the number of
>> bytes to be reserved for a variable or a data type".  Of course, for practical
>> reasons, we can safely assume that a *char* will take one byte in storage
>> space for the foreseeable future, since the consequences of changing that
>> would be disastrous.
>
> Have you read a (fairly modern) C standard or its officially
> published rationle? You might want to.
>
> From the officially published C99 rationale (page labeled 11,
> Terms and definitions):
>
> QUOTE
> ) All objects in C must be representable as a contiguous sequence of bytes,
>  each of which is at least 8 bits wide.
>
> ) A char whether signed or unsigned, occupies exactly one byte.
>
> (Thus, for instance, on a machine with 36-bit words, a byte can be defined
> to consist of 9, 12, 18, or 36 bits, these numbers being all the exact
> divisors of 36 which are not less than 8.) These strictures codify the
> widespread presumption that any object can be treated an an array of
> characters, the size of which is given by the sizeof operator with that
> object's type as its operand.
> END QUOTE


I should have reported that my rationale copy (.pdf)
is of:

QUOTE
Rationale for
International Standard--
Programming Languages --
C

Revisision 5.10
April-2003
END QUOTE


===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Implicit assumptions (was: Re: Some fun with -O2)

Walter von Entferndt-2
In reply to this post by freebsd-hackers mailing list
Dear Sirs,
the behaviour on signed overflow is /explicitely undefined/ in C...
I'd like to kindly suggest to fix the bug instead of fiddling around with
compiler switches to compile a buggy program:
--- check_mktime.c.patch ---
--- check_mktime.c.orig 2021-01-15 03:19:33.962253000 +0100
+++ check_mktime.c      2021-01-15 04:04:22.291014000 +0100
@@ -1,13 +1,16 @@
 /* Test program from Paul Eggert ([hidden email])
    and Tony Leneis ([hidden email]).  */
 # include <sys/time.h>
+# include <sys/param.h>                /* NBBY: #bits of a byte */
 # include <time.h>
 # include <unistd.h>
+# include <stdlib.h>

 /* Work around redefinition to rpl_putenv by other config tests.  */
 #undef putenv

-static time_t time_t_max;
+const time_t t0 = (time_t) 1 << (NBBY*sizeof t0 - 2); /* unused elsewhere */
+static const time_t time_t_max = t0|(t0 - 1);

 /* Values we'll use to set the TZ environment variable.  */
 static const char *const tz_strings[] = {
@@ -106,9 +109,6 @@
   time_t t, delta;
   int i, j;

-  for (time_t_max = 1; 0 < time_t_max; time_t_max *= 2)
-    continue;
-  time_t_max--;
   delta = time_t_max / 997; /* a suitable prime number */
   for (i = 0; i < N_STRINGS; i++)
     {
--
=|o) "Stell' Dir vor es geht und keiner kriegt's hin." (Wolfgang Neuss)
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"

check_mktime.c.patch (1001 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Implicit assumptions (was: Re: Some fun with -O2)

Walter von Entferndt-2
In reply to this post by freebsd-hackers mailing list
At Freitag, 15. Januar 2021, 19:35:40 CET Mark Millard wrote:
> Have you read a (fairly modern) C standard or its officially
> published rationle? You might want to.
>
Honestly, no.  The price to download the official standard PDF from https://
www.iso.org/standard/74528.html is ~200 CHF.  If you can send me a link to
download a copy, I'd be thankful.  Any other good reference manual either in
PDF or HTML (tarball or thelike) would also be fine.  I showed my source
(https://code-reference.com/c/) which I quickly looked up on the net.

> From the officially published C99 rationale (page labeled 11,
> Terms and definitions):
>
> QUOTE
> ) All objects in C must be representable as a contiguous sequence of bytes,
>   each of which is at least 8 bits wide.
>
> ) A char whether signed or unsigned, occupies exactly one byte.
>
That means it does not make any difference to use either NBBY or CHAR_BIT?  
Maybe CHAR_BIT is preferable, because it is C standard (guaranteed to exist on
all platforms), whereas NBBY is not since it's in include/sys?

Beside that, can you affirm the fix I suggested is correct & portable?
--
=|o) "Stell' Dir vor es geht und keiner kriegt's hin." (Wolfgang Neuss)


_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Implicit assumptions (was: Re: Some fun with -O2)

freebsd-hackers mailing list


On 2021-Jan-15, at 11:22, Walter von Entferndt <[hidden email]> wrote:

> At Freitag, 15. Januar 2021, 19:35:40 CET Mark Millard wrote:
>> Have you read a (fairly modern) C standard or its officially
>> published rationle? You might want to.
>>
> Honestly, no.  The price to download the official standard PDF from https://
> www.iso.org/standard/74528.html is ~200 CHF.  If you can send me a link to
> download a copy, I'd be thankful.

The rationale vintage that I have a copy of is available at:

http://www.open-std.org/jtc1/sc22/wg14/www/C99RationaleV5.10.pdf

The rationale is normally a better starting point than the standard
for guiding understanding of the standard --but is not normative.

What makes the rationale a matter starting point for interpreting
the standard is that, while the standard should have the more
overall properties documented in the rationale, one has to infer
various properties from more detailed rules that were designed to
have the properties (and may be better at covering limiting
conditions than the more overall rationale material).

But the rationale can not be used to answer all questions.

As for drafts of vintages of the standard:

http://www.open-std.org/jtc1/sc22/wg14/www/projects#9899

has links (the first 4 N???? links) to final drafts (or
very-early next-standard drafts) of the standards for C2x,
C17, C11, and C99.

My understanding is that C17 mostly dealt with clarification
requests, other than the ATOMIC_VAR_INIT one leading to
depreciating ATOMIC_VAR_INIT instead.

As I understand, FreeBSD basically targets C99 (possibly without
things made optional or depreciated in C11 or later: future
compatibility biased).


> Any other good reference manual either in
> PDF or HTML (tarball or thelike) would also be fine.  I showed my source
> (https://code-reference.com/c/) which I quickly looked up on the net.
>
>> From the officially published C99 rationale (page labeled 11,
>> Terms and definitions):
>>
>> QUOTE
>> ) All objects in C must be representable as a contiguous sequence of bytes,
>>  each of which is at least 8 bits wide.
>>
>> ) A char whether signed or unsigned, occupies exactly one byte.
>>
> That means it does not make any difference to use either NBBY or CHAR_BIT?  
> Maybe CHAR_BIT is preferable, because it is C standard (guaranteed to exist on
> all platforms), whereas NBBY is not since it's in include/sys?

I'd say that using C's CHAR_BIT means less variation in the code from system
to system and less knowledge of system specifics required. So in code not
intended to be system specific, CHAR_BIT would be preferred. That does not
cover code intended to be system specific.

> Beside that, can you affirm the fix I suggested is correct & portable?

Before I could make any grand overall claim about the code, I'll
need to take time to think about all of it, instead of just reporting
things that I happen to quickly notice while scanning the exchange.

I may get to this, but I'm not sure when.


===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Implicit assumptions (was: Re: Some fun with -O2)

Walter von Entferndt-2
At Freitag, 15. Januar 2021, 21:57:14 CET, Mark Millard wrote:
> The rationale vintage that I have a copy of is available at:
>
> http://www.open-std.org/jtc1/sc22/wg14/www/C99RationaleV5.10.pdf
>
Thank you very much.  Now I found that "The result of shifting by a bit count
greater than or equal to the word's size is undefined behavior in C and C++"
(https://en.wikipedia.org/wiki/Bitwise_operation#C-family ; likewise http://
www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf).  So we'll have to go back
to the loop-solution with the multiply-by-2:
--- check_mktime.c.patch ---
--- check_mktime.c.orig 2021-01-15 03:19:33.962253000 +0100
+++ check_mktime.c      2021-01-15 23:22:01.951385000 +0100
@@ -3,6 +3,10 @@
 # include <sys/time.h>
 # include <time.h>
 # include <unistd.h>
+# include <stdlib.h>
+# include <stdio.h>    /* printf() */
+# include <inttypes.h> /* format spec PRIX64: ll/l + X on 32/64-bit arch */
+# include <limits.h>   /* CHAR_BIT */

 /* Work around redefinition to rpl_putenv by other config tests.  */
 #undef putenv
@@ -106,9 +110,15 @@
   time_t t, delta;
   int i, j;

-  for (time_t_max = 1; 0 < time_t_max; time_t_max *= 2)
-    continue;
-  time_t_max--;
+  /* portably compute the maximum of a signed type:
+   * NOTE: << is undefined if the shift width >= word length
+   * i.e. shifting a 64-bit type by 62 on a 32-bit machine: undef
+   */
+  for (i = t = 1; i < CHAR_BIT*sizeof t - 1; i++)
+    t *= 2;
+  time_t_max = t|(t - 1);
+  printf ("time_t_max = 0x%"PRIX64"\n", time_t_max);
+
   delta = time_t_max / 997; /* a suitable prime number */
   for (i = 0; i < N_STRINGS; i++)
     {
--
=|o) "Stell' Dir vor es geht und keiner kriegt's hin." (Wolfgang Neuss)
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"

check_mktime.c.patch (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Implicit assumptions (was: Re: Some fun with -O2)

freebsd-hackers mailing list


On 2021-Jan-15, at 14:25, Walter von Entferndt <walter.von.entferndt at posteo.net> wrote:

> At Freitag, 15. Januar 2021, 21:57:14 CET, Mark Millard wrote:
>> The rationale vintage that I have a copy of is available at:
>>
>> http://www.open-std.org/jtc1/sc22/wg14/www/C99RationaleV5.10.pdf
>>
> Thank you very much.  Now I found that "The result of shifting by a bit count
> greater than or equal to the word's size is undefined behavior in C and C++"
> (https://en.wikipedia.org/wiki/Bitwise_operation#C-family ; likewise http://
> www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf).

The "word size" wording on that wikipedia page is
unfortunate and inaccurate.

N1570 talks of "negative or is greater than or equal
to the width of the promoted left operand". That
phrase is not tied to something like 32 on a 32-bit
machine. long long is 64 bits wide (or potentially
more) and so would allow 0 through 63 based on just
this much of the criteria. (But see what follows.)

The language standard (various vintages) also tend to
talk about "is representable in the result type" vs. not
for "<<", as an example. unsigned defines the "not" cases
in terms of modulo arithmetic. signed types only define
non-negative value left-hand-side cases, and those only
when the result "is representable in the result type".
signed types otherwise get a "behavior is undefined"
classification, or some such wording. The "<<" operation
is described via the mathematical multiplication by the
power of 2 (the right hand side) as far as what may or
may not be "representable".

For ">>" and positive values on the left, division by
the power of 2 is used. But a negative signed value on
the left is described via something like "is implementation
defined".

For both ">>" and "<<", negative right hand sides result
in a "behavior is undefined" classification.

> So we'll have to go back
> to the loop-solution with the multiply-by-2:
> --- check_mktime.c.patch ---
> --- check_mktime.c.orig 2021-01-15 03:19:33.962253000 +0100
> +++ check_mktime.c      2021-01-15 23:22:01.951385000 +0100
> @@ -3,6 +3,10 @@
> # include <sys/time.h>
> # include <time.h>
> # include <unistd.h>
> +# include <stdlib.h>
> +# include <stdio.h>    /* printf() */
> +# include <inttypes.h> /* format spec PRIX64: ll/l + X on 32/64-bit arch */
> +# include <limits.h>   /* CHAR_BIT */
>
> /* Work around redefinition to rpl_putenv by other config tests.  */
> #undef putenv
> @@ -106,9 +110,15 @@
>   time_t t, delta;
>   int i, j;
>
> -  for (time_t_max = 1; 0 < time_t_max; time_t_max *= 2)
> -    continue;
> -  time_t_max--;
> +  /* portably compute the maximum of a signed type:
> +   * NOTE: << is undefined if the shift width >= word length
> +   * i.e. shifting a 64-bit type by 62 on a 32-bit machine: undef
> +   */
> +  for (i = t = 1; i < CHAR_BIT*sizeof t - 1; i++)
> +    t *= 2;
> +  time_t_max = t|(t - 1);
> +  printf ("time_t_max = 0x%"PRIX64"\n", time_t_max);
> +
>   delta = time_t_max / 997; /* a suitable prime number */
>   for (i = 0; i < N_STRINGS; i++)
>     {



===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Implicit assumptions (was: Re: Some fun with -O2)

freebsd-hackers mailing list
Walter, you had written:

> Beside that, can you affirm the fix I suggested is correct & portable?

I'm going to suggest some language generality that you
might not want to deal with. N1570 documents such
material, as an example.

Modern C defines that Integer types other than char,
unsigned char, and signed char are made up of:

signed types: value bits, pad bits (zero or more), and a sign bit

unsigned types: value bits, pad bits (zero or more).

There are also things called "trap representations", that
includes pad bits possibly being something like a parity
bit but there will be some other allowed examples listed
below.

(unsigned char only has value bits and signed char has
one sign bit and the rest being value bits. No form of
char has any pad bits. char is like one of the two
that are explicit about signed vs. unsigned.)

In modern C, signed Integers are limited to:

A) two's complement (by far the typical context)
B) sign and magnitude
C) one's complement

For (A) the sign bit being 1 and the rest of the value
bits being 0 is either a trap representation or a normal
value. (Normal value is by far the typical context.)

For (B) the sign bit being 1 and the rest of the value
bits being 0 is either a trap representation or a normal
value. As a normal value, it is the value called "negative
zero".

For (C) the sign bit being 1 and the rest of the value
bits being 1 is either a trap representation or a normal
value. As a normal value, it is the value called "negative
zero".

I'll note that modern C++ has only (A), with zero pad bits
and no trap representations.

You might want to target (A) with zero pad bits and no
trap representations (all bit partterns being normal
values) but still avoiding other undefined behavior and
possibly avoiding implementation-defined behavior.


===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Implicit assumptions (was: Re: Some fun with -O2)

Walter von Entferndt-2
In reply to this post by freebsd-hackers mailing list
At Samstag, 16. Januar 2021, 00:13:51 CET, Mark Millard wrote:

> > Thank you very much.  Now I found that "The result of shifting by a bit
> > count greater than or equal to the word's size is undefined behavior in C
> > and C++" (https://en.wikipedia.org/wiki/Bitwise_operation#C-family ;
> > likewise http:// www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf).
>
> The "word size" wording on that wikipedia page is
> unfortunate and inaccurate.
>
> N1570 talks of "negative or is greater than or equal
> to the width of the promoted left operand".
>
Thx for the clarification.  That means the compute-at-compile-time solution
would be ok, if there were not...

At Samstag, 16. Januar 2021, 01:26:00 CET, Mark Millard wrote:
[... about pad bits, trap representations, and 1's-complement ...]

Hallelujah.  I did not know that integer representations other than 2's-
complement are allowed in C, and did not take into account these other
pitfalls.  I've heard of that long ago, but forgot about it.  1's-complement
is obviously no problem as long we're dealing with non-zero positives.  And I
did not find anything about the order of any padding bits and the sign bit,
but I'd strongly assume the sign bit is always adjacent to the value bits?  
At least, no arithmetic operation can alter padding bits (not even a shift)?
https://duckduckgo.com/?q=SEI+CERT+C+Coding provides a simple /popcount()/
routine to portably detect padding bits for any integer datatype when given
it's maximum value.  That could be useful.  If INTMAX_MAX has N padding bits,
do all other integer types (except *char*) have the same #padding bits?

Can we safely assume this: a *time_t* is either a *long long* iff *long long*
is supported and it's a 32-bit arch, else *long*?  Or can it be of any of the
minimum- or least-width integer types?  Or is a *time_t* always of *intmax_t*?
In the latter case, this gives us a very simple solution:

#include <stdint.h>
static const time_t time_t_max = INTMAX_MAX;

Or do we have to come up with adjacent probing à la:

#include <limits.h>
#include <stdlib.h>
#include <stdio.h>

if (sizeof(intmax_t) == sizeof(time_t))
    time_t_max = INTMAX_MAX;
else
#ifdef __LONG_LONG_SUPPORTED /* is this macro portable? */
if (sizeof(long long) == sizeof(time_t))
    time_t_max = LLONG_MAX;
else
#endif
if (sizeof(long) == sizeof(time_t))
        time_t_max = LONG_MAX;
else if (sizeof(int) == sizeof(time_t))
        time_t_max = INT_MAX;
else {
        fprintf (stderr, "What the heck is going on?\n");
        exit (EXIT_FAILURE);
}

But reading the standard: time.h declares *clock_t* and *time_t* wich are /
real types/ ... and: "The integer and real floating types are collectively
called /real types/."  Now I'm out.  Bye.  I tried to be helpful, but this is
too much...
--
=|o) "Stell' Dir vor es geht und keiner kriegt's hin." (Wolfgang Neuss)


_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Implicit assumptions (was: Re: Some fun with -O2)

freebsd-hackers mailing list


On 2021-Jan-16, at 01:04, Walter von Entferndt <walter.von.entferndt at posteo.net> wrote:

> At Samstag, 16. Januar 2021, 00:13:51 CET, Mark Millard wrote:
>>> Thank you very much.  Now I found that "The result of shifting by a bit
>>> count greater than or equal to the word's size is undefined behavior in C
>>> and C++" (https://en.wikipedia.org/wiki/Bitwise_operation#C-family ;
>>> likewise http:// www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf).
>>
>> The "word size" wording on that wikipedia page is
>> unfortunate and inaccurate.
>>
>> N1570 talks of "negative or is greater than or equal
>> to the width of the promoted left operand".
>>
> Thx for the clarification.  That means the compute-at-compile-time solution
> would be ok, if there were not...
>
> At Samstag, 16. Januar 2021, 01:26:00 CET, Mark Millard wrote:
> [... about pad bits, trap representations, and 1's-complement ...]
>
> Hallelujah.  I did not know that integer representations other than 2's-
> complement are allowed in C, and did not take into account these other
> pitfalls.  I've heard of that long ago, but forgot about it.  1's-complement
> is obviously no problem as long we're dealing with non-zero positives.  And I
> did not find anything about the order of any padding bits and the sign bit,
> but I'd strongly assume the sign bit is always adjacent to the value bits?  
> At least, no arithmetic operation can alter padding bits (not even a shift)?
> https://duckduckgo.com/?q=SEI+CERT+C+Coding provides a simple /popcount()/
> routine to portably detect padding bits for any integer datatype when given
> it's maximum value.  That could be useful.  If INTMAX_MAX has N padding bits,
> do all other integer types (except *char*) have the same #padding bits?
>
> Can we safely assume this: a *time_t* is either a *long long* iff *long long*
> is supported and it's a 32-bit arch, else *long*?  Or can it be of any of the
> minimum- or least-width integer types?  Or is a *time_t* always of *intmax_t*?
> In the latter case, this gives us a very simple solution:
>
> #include <stdint.h>
> static const time_t time_t_max = INTMAX_MAX;
>
> Or do we have to come up with adjacent probing à la:
>
> #include <limits.h>
> #include <stdlib.h>
> #include <stdio.h>
>
> if (sizeof(intmax_t) == sizeof(time_t))
>    time_t_max = INTMAX_MAX;
> else
> #ifdef __LONG_LONG_SUPPORTED /* is this macro portable? */
> if (sizeof(long long) == sizeof(time_t))
>    time_t_max = LLONG_MAX;
> else
> #endif
> if (sizeof(long) == sizeof(time_t))
> time_t_max = LONG_MAX;
> else if (sizeof(int) == sizeof(time_t))
> time_t_max = INT_MAX;
> else {
> fprintf (stderr, "What the heck is going on?\n");
> exit (EXIT_FAILURE);
> }
>
> But reading the standard: time.h declares *clock_t* and *time_t* wich are /
> real types/ ... and: "The integer and real floating types are collectively
> called /real types/."  Now I'm out.  Bye.  I tried to be helpful, but this is
> too much...

This is the sort of thing where FreeBSD and Linux use a C subset
for the specific issue, as defined by POSIX.1-2017/"The Open Group
Base Specifications" Issue 7, 2018 edition/IEEE STd 1003-2017/... .
For example:

https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/sys_types.h.html

reports (in its own terminology that may not be a full match to C's):

QUOTE
clock_t shall be an integer or real-floating type. time_t shall be an integer type.
END QUOTE

Limiting to such a time_t is easier to cover.

(Note: POSIX in parts is a superset of C as well.)

One of the things about standards: there are so many to choose
from or to mix and match. Or as some of the pages put it:

QUOTE
The functionality described on this reference page is aligned with the ISO C standard. Any conflict between the requirements described here and the ISO C standard is unintentional. This volume of POSIX.1-2017 defers to the ISO C standard.
END QUOTE

Restricting time_t to integer types leaves a compatible context.
So: no "conflict" in that respect.

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Implicit assumptions (was: Re: Some fun with -O2)

Walter von Entferndt-2
At Samstag, 16. Januar 2021, 11:23:03 CET, Mark Millard wrote:

> This is the sort of thing where FreeBSD and Linux use a C subset
> for the specific issue, as defined by POSIX.1-2017/"The Open Group
> Base Specifications" Issue 7, 2018 edition/IEEE STd 1003-2017/... .
> For example:
>
> https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/sys_types.h.html
>
> reports (in its own terminology that may not be a full match to C's):
>
> QUOTE
> clock_t shall be an integer or real-floating type. time_t shall be an
> integer type. END QUOTE
>
> Limiting to such a time_t is easier to cover. [...]
> Restricting time_t to integer types leaves a compatible context.
> So: no "conflict" in that respect.
>
My understanding is that check_mktime.c is a test program called by autoconf
or such, designed to run on various UNIX platforms?  I mean it does not only
need to run on BSD & Linux.
Maybe something like this will cover the most cases in practice:
--- check_mktime.c.patch ---
--- check_mktime.c.orig 2021-01-15 03:19:33.962253000 +0100
+++ check_mktime.c 2021-01-16 21:00:36.160616000 +0100
@@ -3,6 +3,13 @@
 # include <sys/time.h>
 # include <time.h>
 # include <unistd.h>
+# include <stdlib.h>
+#include <stdint.h>
+# include <stdio.h> /* printf() */
+# include <limits.h> /* CHAR_BIT */
+#if 0
+# include <inttypes.h> /* format spec PRIX64: ll/l + X on 32/64-bit arch */
+#endif
 
 /* Work around redefinition to rpl_putenv by other config tests.  */
 #undef putenv
@@ -16,6 +23,78 @@
 };
 #define N_STRINGS (sizeof (tz_strings) / sizeof (tz_strings[0]))
 
+/* Count the bits set in any unsigned integer type.
+ * Returns the precision (width - padding bits - sign bit) iff given
+ * the xxx_MAX value of any integer type, signed or unsigned.
+ * From SEI CERT C Coding Standard:
+ * Rules for Developing Safe, Reliable, and Secure Systems (2016)
+ */
+size_t
+popcount (num)
+uintmax_t num;
+{
+  size_t cnt = 0;
+    
+  while (num != 0)
+    {
+      if (num % 2 == 1)
+      cnt++;
+      num >>= 1;
+    }
+  return cnt;
+}
+#define PRECISION(max_value) popcount(max_value)
+
+/* Guess the maximum value of a time_t from it's storage width.
+ * ASSERT time_t is not a floating point, or of any arcane width, or
unsigned.
+ * Only 4...8 byte width of a time_t are tested.
+ * On error: returns (time_t)(-1)
+ */
+time_t
+guess_time_t_max ()
+{
+  time_t t0, t1 = (time_t)(-1);
+  size_t size, prec;
+
+  switch ((size = sizeof(time_t)))
+    {
+      case 4:
+ prec = PRECISION((time_t) 0xFFFFFFFF);
+ break;
+      case 5:
+ prec = PRECISION((time_t) 0xFFFFFFFFFF);
+ break;
+      case 6:
+ prec = PRECISION((time_t) 0xFFFFFFFFFFFF);
+ break;
+      case 7:
+ prec = PRECISION((time_t) 0xFFFFFFFFFFFFFF);
+ break;
+      case 8:
+ prec = PRECISION((time_t) 0xFFFFFFFFFFFFFFFF);
+ break;
+      default:
+ prec = 1;
+ break;
+    }
+  prec--; /* assumption: time_t is signed */
+  if (prec)
+    {
+      t0 = (time_t) 1 << (prec - 1);
+      t1 = t0|(t0 - 1);
+    }
+
+  /* FIXME not portable: time_t can be floating point type,
+   * or another integer type other than long or long long.
+   *
+  fprintf (stderr, "time_t_max\t= 0x%"PRIX64"\n", t1);*/
+  fprintf (stderr, "sizeof(time_t)\t= %2zd byte\n", size);
+  fprintf (stderr, "precision\t= %2zd bit\n", prec);
+  fprintf (stderr, "padding\t\t= %2zd bit\n", size*CHAR_BIT - prec - 1 /*
sign */);
+
+  return t1;
+}
+
 /* Fail if mktime fails to convert a date in the spring-forward gap.
    Based on a problem report from Andreas Jaeger.  */
 static void
@@ -106,9 +185,7 @@
   time_t t, delta;
   int i, j;
 
-  for (time_t_max = 1; 0 < time_t_max; time_t_max *= 2)
-    continue;
-  time_t_max--;
+  time_t_max = guess_time_t_max ();
   delta = time_t_max / 997; /* a suitable prime number */
   for (i = 0; i < N_STRINGS; i++)
     {
@@ -128,3 +205,4 @@
   spring_forward_gap ();
   exit (0);
 }
+/*! vi: set ai tabstop=8 shiftwidth=2: */

---
--
=|o) "Stell' Dir vor es geht und keiner kriegt's hin." (Wolfgang Neuss)
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"

check_mktime.c.patch (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Implicit assumptions (was: Re: Some fun with -O2)

freebsd-hackers mailing list

On 2021-Jan-16, at 12:19, Walter von Entferndt <walter.von.entferndt at posteo.net> wrote:

> At Samstag, 16. Januar 2021, 11:23:03 CET, Mark Millard wrote:
>> This is the sort of thing where FreeBSD and Linux use a C subset
>> for the specific issue, as defined by POSIX.1-2017/"The Open Group
>> Base Specifications" Issue 7, 2018 edition/IEEE STd 1003-2017/... .
>> For example:
>>
>> https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/sys_types.h.html
>>
>> reports (in its own terminology that may not be a full match to C's):
>>
>> QUOTE
>> clock_t shall be an integer or real-floating type. time_t shall be an
>> integer type. END QUOTE
>>
>> Limiting to such a time_t is easier to cover. [...]
>> Restricting time_t to integer types leaves a compatible context.
>> So: no "conflict" in that respect.
>>
> My understanding is that check_mktime.c is a test program called by autoconf
> or such, designed to run on various UNIX platforms?  I mean it does not only
> need to run on BSD & Linux.

QUOTE ( from https://en.wikipedia.org/wiki/POSIX )
The Portable Operating System Interface (POSIX) is a family of standards specified by the IEEE Computer Society for maintaining compatibility between operating systems.[1] POSIX defines the application programming interface (API), along with command line shells and utility interfaces, for software compatibility with variants of Unix and other operating systems.
END QUOTE

The section "POSIX-oriented operating systems" lists many OSs
that are either certified, mostly-compliant, have means of
having a POSIX environment (includes Microsoft Windows, OS/2,
DOS), or have a (optional) POSIX compatibility layer. I'll
not make a full list here.

You can use that to help judge if only supporting time_t as an
integer type is worthwhile in your view.

> Maybe something like this will cover the most cases in practice:

There are more details here that I'd need to think about
that I've not dealt with yet.

> --- check_mktime.c.patch ---
> --- check_mktime.c.orig 2021-01-15 03:19:33.962253000 +0100
> +++ check_mktime.c 2021-01-16 21:00:36.160616000 +0100
> @@ -3,6 +3,13 @@
> # include <sys/time.h>
> # include <time.h>
> # include <unistd.h>
> +# include <stdlib.h>
> +#include <stdint.h>
> +# include <stdio.h> /* printf() */
> +# include <limits.h> /* CHAR_BIT */
> +#if 0
> +# include <inttypes.h> /* format spec PRIX64: ll/l + X on 32/64-bit arch */
> +#endif
>
> /* Work around redefinition to rpl_putenv by other config tests.  */
> #undef putenv
> @@ -16,6 +23,78 @@
> };
> #define N_STRINGS (sizeof (tz_strings) / sizeof (tz_strings[0]))
>
> +/* Count the bits set in any unsigned integer type.
> + * Returns the precision (width - padding bits - sign bit) iff given
> + * the xxx_MAX value of any integer type, signed or unsigned.
> + * From SEI CERT C Coding Standard:
> + * Rules for Developing Safe, Reliable, and Secure Systems (2016)
> + */
> +size_t
> +popcount (num)
> +uintmax_t num;
> +{
> +  size_t cnt = 0;
> +    
> +  while (num != 0)
> +    {
> +      if (num % 2 == 1)
> +      cnt++;
> +      num >>= 1;
> +    }
> +  return cnt;
> +}
> +#define PRECISION(max_value) popcount(max_value)
> +
> +/* Guess the maximum value of a time_t from it's storage width.
> + * ASSERT time_t is not a floating point, or of any arcane width, or
> unsigned.
> + * Only 4...8 byte width of a time_t are tested.
> + * On error: returns (time_t)(-1)
> + */
> +time_t
> +guess_time_t_max ()
> +{
> +  time_t t0, t1 = (time_t)(-1);
> +  size_t size, prec;
> +
> +  switch ((size = sizeof(time_t)))
> +    {
> +      case 4:
> + prec = PRECISION((time_t) 0xFFFFFFFF);
> + break;
> +      case 5:
> + prec = PRECISION((time_t) 0xFFFFFFFFFF);
> + break;
> +      case 6:
> + prec = PRECISION((time_t) 0xFFFFFFFFFFFF);
> + break;
> +      case 7:
> + prec = PRECISION((time_t) 0xFFFFFFFFFFFFFF);
> + break;
> +      case 8:
> + prec = PRECISION((time_t) 0xFFFFFFFFFFFFFFFF);
> + break;
> +      default:
> + prec = 1;
> + break;
> +    }
> +  prec--; /* assumption: time_t is signed */
> +  if (prec)
> +    {
> +      t0 = (time_t) 1 << (prec - 1);
> +      t1 = t0|(t0 - 1);
> +    }
> +
> +  /* FIXME not portable: time_t can be floating point type,
> +   * or another integer type other than long or long long.
> +   *
> +  fprintf (stderr, "time_t_max\t= 0x%"PRIX64"\n", t1);*/
> +  fprintf (stderr, "sizeof(time_t)\t= %2zd byte\n", size);
> +  fprintf (stderr, "precision\t= %2zd bit\n", prec);
> +  fprintf (stderr, "padding\t\t= %2zd bit\n", size*CHAR_BIT - prec - 1 /*
> sign */);
> +
> +  return t1;
> +}
> +
> /* Fail if mktime fails to convert a date in the spring-forward gap.
>    Based on a problem report from Andreas Jaeger.  */
> static void
> @@ -106,9 +185,7 @@
>   time_t t, delta;
>   int i, j;
>
> -  for (time_t_max = 1; 0 < time_t_max; time_t_max *= 2)
> -    continue;
> -  time_t_max--;
> +  time_t_max = guess_time_t_max ();
>   delta = time_t_max / 997; /* a suitable prime number */
>   for (i = 0; i < N_STRINGS; i++)
>     {
> @@ -128,3 +205,4 @@
>   spring_forward_gap ();
>   exit (0);
> }
> +/*! vi: set ai tabstop=8 shiftwidth=2: */
>


===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)

_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Implicit assumptions (was: Re: Some fun with -O2)

Walter von Entferndt-2
At Samstag, 16. Januar 2021, 23:08:30 CET, Mark Millard wrote:
> [... on POSIX ...]
> You can use that to help judge if only supporting time_t as an
> integer type is worthwhile in your view.
>
Yes, I think it's worth it.  I'll try to find out if there's some
implementation around with a floating point *time_t*, or with a *time_t*
longer than 8 bytes or shorter than 4.  I'd strongly assume there's
(currently) not, because < 32 bit or > 8 byte *time_t* support an unreasonably
short (or long, resp.) time frame.  Special applications will very likely use
their own data format and not *time_t*.  With an 8-byte *time_t* you could
even represent the time of big bang with a 1-second precision, right?

This is a version with explicit assignments (and thus type conversions) in the
routine /guess_time_t_max ()/ and another loop susceptible for overflow fixed,
and the formatting done according to the genuine file.  I hope this is maximal
portable, /but/ I guess it will not compile on old compilers due to the use of
the (new?) *uint_max_t* and *uint64_t* types.
--- check_mktime.c.patch ---
--- check_mktime.c.orig 2021-01-15 03:19:33.962253000 +0100
+++ check_mktime.c 2021-01-17 19:33:55.695872000 +0100
@@ -3,6 +3,11 @@
 # include <sys/time.h>
 # include <time.h>
 # include <unistd.h>
+# include <stdlib.h> /* putenv(), exit(), EXIT_xxx */
+# include <stdint.h> /* uintmax_t */
+# include <stdio.h> /* printf() */
+# include <limits.h> /* CHAR_BIT, INT_MAX */
+# include <inttypes.h> /* format spec PRIX64: ll/l + X on 32/64-bit arch */
 
 /* Work around redefinition to rpl_putenv by other config tests.  */
 #undef putenv
@@ -16,6 +21,80 @@
 };
 #define N_STRINGS (sizeof (tz_strings) / sizeof (tz_strings[0]))
 
+/* The following two routines (should) work for integer representation
+   in either 2's or 1's complement (else it's a compiler bug).  */
+
+/* Count the bits set in any integer type.
+   Returns the precision (width - padding bits - sign bit) iff given
+   the xxx_MAX value of any integer type, signed or unsigned.
+   From SEI CERT C Coding Standard:
+   Rules for Developing Safe, Reliable, and Secure Systems (2016)  */
+size_t
+popcount (num)
+uintmax_t num;
+{
+  size_t cnt = 0;
+    
+  while (num != 0)
+    {
+      if (num % 2 == 1)
+ cnt++;
+      num >>= 1;
+    }
+  return cnt;
+}
+#define PRECISION(max_value) popcount(max_value)
+#define SIGN_BIT (1)
+
+/* Guess the maximum value of a time_t from it's storage width.
+   ASSERT time_t is not a floating point, or of any arcane width,
+   or unsigned (i.e. the bit pattern of (-1) treated special).
+   Only 4...8 byte width of a time_t are tested.
+   On error: returns (time_t)(-1)  */
+time_t
+guess_time_t_max ()
+{
+  time_t t0, t1 = (time_t)(-1);
+  size_t size, prec = 1;
+  uint64_t u64;
+
+  switch ((size = sizeof(time_t)))
+    {
+      case 4:
+ t0 = (time_t) 0xFFFFFFFF;
+ break;
+      case 5:
+ t0 = (time_t) 0xFFFFFFFFFF;
+ break;
+      case 6:
+ t0 = (time_t) 0xFFFFFFFFFFFF;
+ break;
+      case 7:
+ t0 = (time_t) 0xFFFFFFFFFFFFFF;
+ break;
+      case 8:
+ t0 = (time_t) 0xFFFFFFFFFFFFFFFF;
+ break;
+      default:
+ prec = 0;
+ break;
+    }
+  if (prec)
+    {
+      prec = PRECISION(t0) - SIGN_BIT;
+      t0 = (time_t) 1 << (prec - 1);
+      t1 = t0|(t0 - 1);
+    }
+#ifdef DEBUG
+  fprintf (stderr, "time_t_max\t= 0x%"PRIX64"\n", (u64 = t1));
+  fprintf (stderr, "sizeof(time_t)\t= %2zd byte\n", size);
+  fprintf (stderr, "precision\t= %2zd bit\n", prec);
+  fprintf (stderr, "padding\t\t= %2zd bit\n", size*CHAR_BIT - prec -
SIGN_BIT);
+#endif /* DEBUG */
+  return t1;
+}
+#undef SIGN_BIT
+
 /* Fail if mktime fails to convert a date in the spring-forward gap.
    Based on a problem report from Andreas Jaeger.  */
 static void
@@ -37,8 +116,8 @@
   tm.tm_min = 0;
   tm.tm_sec = 0;
   tm.tm_isdst = -1;
-  if (mktime (&tm) == (time_t)-1)
-    exit (1);
+  if (mktime (&tm) == (time_t)(-1))
+    exit (EXIT_FAILURE);
 }
 
 static void
@@ -47,10 +126,10 @@
 {
   struct tm *lt;
   if ((lt = localtime (&now)) && mktime (lt) != now)
-    exit (1);
+    exit (EXIT_FAILURE);
   now = time_t_max - now;
   if ((lt = localtime (&now)) && mktime (lt) != now)
-    exit (1);
+    exit (EXIT_FAILURE);
 }
 
 static void
@@ -67,7 +146,7 @@
   tm.tm_isdst = -1;
   mktime (&tm);
   if (tm.tm_mon != 2 || tm.tm_mday != 31)
-    exit (1);
+    exit (EXIT_FAILURE);
 }
 
 static void
@@ -82,7 +161,7 @@
   alarm (10);
   now = mktime (&tm);
   alarm (0);
-  if (now != (time_t) -1)
+  if (now != (time_t)(-1))
     {
       struct tm *lt = localtime (&now);
       if (! (lt
@@ -96,7 +175,7 @@
      && lt->tm_wday == tm.tm_wday
      && ((lt->tm_isdst < 0 ? -1 : 0 < lt->tm_isdst)
   == (tm.tm_isdst < 0 ? -1 : 0 < tm.tm_isdst))))
- exit (1);
+ exit (EXIT_FAILURE);
     }
 }
 
@@ -106,9 +185,10 @@
   time_t t, delta;
   int i, j;
 
-  for (time_t_max = 1; 0 < time_t_max; time_t_max *= 2)
-    continue;
-  time_t_max--;
+  time_t_max = guess_time_t_max ();
+  if (time_t_max == (time_t)(-1))
+    exit (EXIT_FAILURE);
+
   delta = time_t_max / 997; /* a suitable prime number */
   for (i = 0; i < N_STRINGS; i++)
     {
@@ -120,11 +200,15 @@
       mktime_test ((time_t) 60 * 60);
       mktime_test ((time_t) 60 * 60 * 24);
 
-      for (j = 1; 0 < j; j *= 2)
+      /* NOTE This does not reach very large values > time_t_max/2
+         neither the (susceptible for overflow) version
+         for (j = 1; 0 < j; j *= 2) does  */
+      for (i = j = 1; i < PRECISION(INT_MAX); i++, j *= 2)
         bigtime_test (j);
       bigtime_test (j - 1);
     }
   irix_6_4_bug ();
   spring_forward_gap ();
-  exit (0);
+  exit (EXIT_SUCCESS);
 }
+/*! vi: set ai tabstop=8 shiftwidth=2: */

---
--
=|o) "Stell' Dir vor es geht und keiner kriegt's hin." (Wolfgang Neuss)
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"

check_mktime.c.patch (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Implicit assumptions (was: Re: Some fun with -O2)

Walter von Entferndt-2
In reply to this post by freebsd-hackers mailing list
My previous naiive attempt silently assumed CHAR_BIT = 8...  Isn't that funny?  
Now this is hopefully my last try to fix the issue, it should be agnostic to
any padding and machine word length.  It does not cover *time_t* beeing a
floating point type, though.  Compile witch 'cc -DDEBUG' to get the
diagnostics to *stderr* .

Regards

--- check_mktime.c.patch ---
--- check_mktime.c.orig 2021-01-15 03:19:33.962253000 +0100
+++ check_mktime.c 2021-01-17 22:01:38.157124000 +0100
@@ -3,6 +3,11 @@
 # include <sys/time.h>
 # include <time.h>
 # include <unistd.h>
+# include <stdlib.h> /* putenv(), exit(), EXIT_xxx */
+# include <stdint.h> /* intmax_t */
+# include <stdio.h> /* printf() */
+# include <limits.h> /* CHAR_BIT, INT_MAX */
+# include <inttypes.h> /* format spec PRIX64: ll/l + X on 32/64-bit arch */
 
 /* Work around redefinition to rpl_putenv by other config tests.  */
 #undef putenv
@@ -16,6 +21,74 @@
 };
 #define N_STRINGS (sizeof (tz_strings) / sizeof (tz_strings[0]))
 
+/* The following two routines (should) work for integer representation
+   in either 2's or 1's complement and for any #bits per byte.  */
+
+/* Count the bits set in any integer type.
+   Returns the precision (width - padding bits - sign bit) iff given
+   the xxx_MAX value of any integer type, signed or unsigned.
+   From SEI CERT C Coding Standard:
+   Rules for Developing Safe, Reliable, and Secure Systems (2016)  */
+size_t
+popcount (num)
+uintmax_t num;
+{
+  size_t cnt = 0;
+    
+  while (num != 0)
+    {
+      if (num % 2 == 1)
+ cnt++;
+      num >>= 1;
+    }
+  return cnt;
+}
+#define PRECISION(max_value) popcount(max_value)
+#define SIGN_BIT (1)
+#ifndef MIN
+#define MIN(a, b) ((a) < (b)? (a): (b))
+#endif
+/* Guess the maximum value of a time_t from it's storage width.
+   On error: returns (time_t)(-1) iff time_t is longer than an intmax_t.
+   ASSERTION: time_t is a signed integer type,
+   i.e. not (unsigned, but the bit pattern of (-1) treated special).  */
+time_t
+guess_time_t_max ()
+{
+  time_t t0, t1 = (time_t)(-1);
+  size_t size = sizeof(time_t);
+  size_t prec;
+  uintmax_t max;
+  uint64_t u64;
+
+  if (sizeof(time_t) > sizeof(uintmax_t))
+    return t1;
+  
+  /* Get an uintmax_t with all bits set that could be in a time_t.
+     We can not do this calculation with a time_t as long we do not
+     know its precision (overflow could occur).  */
+  prec = MIN (PRECISION (UINTMAX_MAX), CHAR_BIT * sizeof(time_t));
+  max = (uintmax_t) 1 << (prec - 1);
+  max = max|(max - 1);
+
+  t0 = max; /* maybe trucation happens here */
+  
+  /* Now account for any padding */
+  prec = PRECISION(t0) - SIGN_BIT;
+  t0 = (time_t) 1 << (prec - 1);
+  t1 = t0|(t0 - 1);
+  
+#ifdef DEBUG
+  fprintf (stderr, "time_t_max\t= 0x%"PRIX64"\n", (u64 = t1));
+  fprintf (stderr, "sizeof(time_t)\t= %2zd byte\n", size);
+  fprintf (stderr, "precision\t= %2zd bit\n", prec);
+  fprintf (stderr, "padding\t\t= %2zd bit\n", CHAR_BIT*size - prec -
SIGN_BIT);
+  fprintf (stderr, "bits per byte\t= %2d bit\n", CHAR_BIT);
+#endif /* DEBUG */
+  return t1;
+}
+#undef SIGN_BIT
+
 /* Fail if mktime fails to convert a date in the spring-forward gap.
    Based on a problem report from Andreas Jaeger.  */
 static void
@@ -37,8 +110,8 @@
   tm.tm_min = 0;
   tm.tm_sec = 0;
   tm.tm_isdst = -1;
-  if (mktime (&tm) == (time_t)-1)
-    exit (1);
+  if (mktime (&tm) == (time_t)(-1))
+    exit (EXIT_FAILURE);
 }
 
 static void
@@ -47,10 +120,10 @@
 {
   struct tm *lt;
   if ((lt = localtime (&now)) && mktime (lt) != now)
-    exit (1);
+    exit (EXIT_FAILURE);
   now = time_t_max - now;
   if ((lt = localtime (&now)) && mktime (lt) != now)
-    exit (1);
+    exit (EXIT_FAILURE);
 }
 
 static void
@@ -67,7 +140,7 @@
   tm.tm_isdst = -1;
   mktime (&tm);
   if (tm.tm_mon != 2 || tm.tm_mday != 31)
-    exit (1);
+    exit (EXIT_FAILURE);
 }
 
 static void
@@ -82,7 +155,7 @@
   alarm (10);
   now = mktime (&tm);
   alarm (0);
-  if (now != (time_t) -1)
+  if (now != (time_t)(-1))
     {
       struct tm *lt = localtime (&now);
       if (! (lt
@@ -96,7 +169,7 @@
      && lt->tm_wday == tm.tm_wday
      && ((lt->tm_isdst < 0 ? -1 : 0 < lt->tm_isdst)
   == (tm.tm_isdst < 0 ? -1 : 0 < tm.tm_isdst))))
- exit (1);
+ exit (EXIT_FAILURE);
     }
 }
 
@@ -106,9 +179,10 @@
   time_t t, delta;
   int i, j;
 
-  for (time_t_max = 1; 0 < time_t_max; time_t_max *= 2)
-    continue;
-  time_t_max--;
+  time_t_max = guess_time_t_max ();
+  if (time_t_max == (time_t)(-1))
+    exit (EXIT_FAILURE);
+
   delta = time_t_max / 997; /* a suitable prime number */
   for (i = 0; i < N_STRINGS; i++)
     {
@@ -120,11 +194,15 @@
       mktime_test ((time_t) 60 * 60);
       mktime_test ((time_t) 60 * 60 * 24);
 
-      for (j = 1; 0 < j; j *= 2)
+      /* NOTE This does not reach very large values > time_t_max/2
+         neither the (susceptible for overflow) version
+         for (j = 1; 0 < j; j *= 2) does  */
+      for (i = j = 1; i < PRECISION(INT_MAX); i++, j *= 2)
         bigtime_test (j);
       bigtime_test (j - 1);
     }
   irix_6_4_bug ();
   spring_forward_gap ();
-  exit (0);
+  exit (EXIT_SUCCESS);
 }
+/*! vi: set ai tabstop=8 shiftwidth=2: */

--
=|o) "Stell' Dir vor es geht und keiner kriegt's hin." (Wolfgang Neuss)
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[hidden email]"

check_mktime.c.patch (4K) Download Attachment
12