Optimization bug with floating-point?

classic Classic list List threaded Threaded
21 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Optimization bug with floating-point?

Steve Kargl
All,

There seems to an optimization bug with clang on

% uname -a
FreeBSD mobile 13.0-CURRENT FreeBSD 13.0-CURRENT r344653 MOBILE  i386

IOW, if you do numerica work on i386, you may want to check your
results.

The program demonstrating the issue is at the end of this email.

gcc8 --version
gcc8 (FreeBSD Ports Collection) 8.3.0

gcc8 -fno-builtin -o z a.c -lm && ./z
gcc8 -O -fno-builtin -o z a.c -lm && ./z
gcc8 -O2 -fno-builtin -o z a.c -lm && ./z
gcc8 -O3 -fno-builtin -o z a.c -lm && ./z

Max ULP: 2.297073
Count: 0           (# of ULP that exceed 21)


cc --version
FreeBSD clang version 7.0.1 (tags/RELEASE_701/final 349250)
(based on LLVM 7.0.1)
Target: i386-unknown-freebsd13.0

cc -fno-builtin -o z a.c -lm && ./z
Max ULP: 2.297073
Count: 0

cc -O -fno-builtin -o z a.c -lm && ./z
cc -O2 -fno-builtin -o z a.c -lm && ./z
cc -O3 -fno-builtin -o z a.c -lm && ./z

   ur ui: 21.588761 7.006300
     x y: 9.5623927 1.4993777
  csinhf: 5.07348328e+02 7.09178613e+03
dp_csinh: 5.07348986e+02 7.09178955e+03
   sinhf: 7.10991113e+03
    cosf: 7.13578984e-02

Max ULP: 23.061242
Count: 39          (# of ULP that exceeds 21)

Things are much worse than this toy program shows.
My test program used in development of libm is giving

Restrict x < 10

./testf -u -X 10
Max ULP Re: 136628.340239
Max ULP Im: 1891176.003955

Restrict c < 50
./testf -u -X 10
Max ULP Re: 3615923.332529
Max ULP Im: 13677733.591783

/*
 * Compute 1 million valus of csinhf() and then compute the ULP for
 * for the real and imaginary parts.
 */
#include <complex.h>
#include <float.h>
#include <math.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>

/* Return 0 <= x < 1. */
double
ranged(void)
{

   union {
      double x;
      struct {
         uint32_t lo;
         uint32_t hi;
      } u;
   } v;
   v.u.hi = (uint32_t)random();
   v.u.hi = ((v.u.hi << 11) >> 11) | 0x3ff00000;
   v.u.lo = (uint32_t)random();
   return (v.x - 1);
}

float
rangef(void)
{

        float s;
        s = (float)ranged();
        return (s);
}

/* Double precision csinh() without using C's double complex.s */
void
dp_csinh(double x, double y, double *re, double *im)
{
   double c, s;
   sincos(y, &s, &c);
   *re = sinh(x) * c;
   *im = cosh(x) * s;
}

/* ULP estimate. */
double
ulpfd(float app, double acc)
{
        int n;
        double f;
        f = frexp(acc, &n);
        f = fabs(acc - app);
        f = ldexp(f, FLT_MANT_DIG - n);
        return (f);    
}

int
main(void)
{
   double re, im, u, ur, ui;
   float complex f;
   float x, y;
   int cnt, i;

   srandom(19632019);

   ur = ui = 0;

   for (cnt = 0, i = 0; i < 10000000; i++) {
      x = rangef() + 9;
      y = rangef() + 0.5;
      f = csinhf(CMPLXF(x,y));
      dp_csinh((double)x, (double)y, &re, &im);
      ur = ulpfd(crealf(f), re);
      if (ur > u) u = ur;
      ui = ulpfd(cimagf(f), im);
      if (ui > u) u = ui;
      if (ur > 21 || ui > 21) {
         printf("   ur ui: %f %f\n", ur, ui);
         printf("     x y: %.7f %.7f\n", x, y);
         printf("  csinhf: %.8e %.8e\n", crealf(f), cimagf(f));
         printf("dp_csinh: %.8le %.8le\n", re, im);
         printf("   sinhf: %.8e\n", sinhf(x));
         printf("    cosf: %.8e\n\n", cosf(y));
         cnt++;
      }
   }
   printf("Max ULP: %f\n", u);
   printf("Count: %d\n", cnt);
   return (0);
}


--
Steve
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Optimization bug with floating-point?

Steve Kargl
On Tue, Mar 12, 2019 at 07:45:41PM -0700, Steve Kargl wrote:

> All,
>
> There seems to an optimization bug with clang on
>
> % uname -a
> FreeBSD mobile 13.0-CURRENT FreeBSD 13.0-CURRENT r344653 MOBILE  i386
>
> IOW, if you do numerica work on i386, you may want to check your
> results.
>
> The program demonstrating the issue is at the end of this email.
>
> gcc8 --version
> gcc8 (FreeBSD Ports Collection) 8.3.0
>
> gcc8 -fno-builtin -o z a.c -lm && ./z
> gcc8 -O -fno-builtin -o z a.c -lm && ./z
> gcc8 -O2 -fno-builtin -o z a.c -lm && ./z
> gcc8 -O3 -fno-builtin -o z a.c -lm && ./z
>
> Max ULP: 2.297073
> Count: 0           (# of ULP that exceed 21)
>

The above results do not change if one add -ffloat-store to
the command line.


> cc -O -fno-builtin -o z a.c -lm && ./z
> cc -O2 -fno-builtin -o z a.c -lm && ./z
> cc -O3 -fno-builtin -o z a.c -lm && ./z
>
> Max ULP: 23.061242
> Count: 39          (# of ULP that exceeds 21)

Clang doesn't support -ffloat-store, so the above does not change.

--
Steve
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Optimization bug with floating-point?

Steve Kargl
In reply to this post by Steve Kargl
On Tue, Mar 12, 2019 at 07:45:41PM -0700, Steve Kargl wrote:
>
> cc -O -fno-builtin -o z a.c -lm && ./z
> cc -O2 -fno-builtin -o z a.c -lm && ./z
> cc -O3 -fno-builtin -o z a.c -lm && ./z
>
>
> Max ULP: 23.061242
> Count: 39          (# of ULP that exceeds 21)
>

These results do not change if one uses /usr/local/bin/clang60.

--
Steve
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Optimization bug with floating-point?

Steve Kargl
In reply to this post by Steve Kargl
On Tue, Mar 12, 2019 at 07:45:41PM -0700, Steve Kargl wrote:

>
> gcc8 --version
> gcc8 (FreeBSD Ports Collection) 8.3.0
>
> gcc8 -fno-builtin -o z a.c -lm && ./z
> gcc8 -O -fno-builtin -o z a.c -lm && ./z
> gcc8 -O2 -fno-builtin -o z a.c -lm && ./z
> gcc8 -O3 -fno-builtin -o z a.c -lm && ./z
>
> Max ULP: 2.297073
> Count: 0           (# of ULP that exceed 21)
>

clang agrees with gcc8 if one changes ...

> int
> main(void)
> {
>    double re, im, u, ur, ui;
>    float complex f;
>    float x, y;

this line to "volatile float x, y".

--
Steve
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Optimization bug with floating-point?

Hans Petter Selasky-6
On 3/13/19 4:16 PM, Steve Kargl wrote:

> On Tue, Mar 12, 2019 at 07:45:41PM -0700, Steve Kargl wrote:
>>
>> gcc8 --version
>> gcc8 (FreeBSD Ports Collection) 8.3.0
>>
>> gcc8 -fno-builtin -o z a.c -lm && ./z
>> gcc8 -O -fno-builtin -o z a.c -lm && ./z
>> gcc8 -O2 -fno-builtin -o z a.c -lm && ./z
>> gcc8 -O3 -fno-builtin -o z a.c -lm && ./z
>>
>> Max ULP: 2.297073
>> Count: 0           (# of ULP that exceed 21)
>>
>
> clang agrees with gcc8 if one changes ...
>
>> int
>> main(void)
>> {
>>     double re, im, u, ur, ui;
>>     float complex f;
>>     float x, y;
>
> this line to "volatile float x, y".
>

Can you try to use:

#define sincos(x,p,q) do { \
         *(p) = sin(x); \
         *(q) = cos(x); \
} while (0)


Instead of libm's sincos(). Might be a bug in there.

--HPS
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Optimization bug with floating-point?

Steve Kargl
On Wed, Mar 13, 2019 at 04:41:51PM +0100, Hans Petter Selasky wrote:

> On 3/13/19 4:16 PM, Steve Kargl wrote:
> > On Tue, Mar 12, 2019 at 07:45:41PM -0700, Steve Kargl wrote:
> >>
> >> gcc8 --version
> >> gcc8 (FreeBSD Ports Collection) 8.3.0
> >>
> >> gcc8 -fno-builtin -o z a.c -lm && ./z
> >> gcc8 -O -fno-builtin -o z a.c -lm && ./z
> >> gcc8 -O2 -fno-builtin -o z a.c -lm && ./z
> >> gcc8 -O3 -fno-builtin -o z a.c -lm && ./z
> >>
> >> Max ULP: 2.297073
> >> Count: 0           (# of ULP that exceed 21)
> >>
> >
> > clang agrees with gcc8 if one changes ...
> >
> >> int
> >> main(void)
> >> {
> >>     double re, im, u, ur, ui;
> >>     float complex f;
> >>     float x, y;
> >
> > this line to "volatile float x, y".
> >
>
> Can you try to use:
>
> #define sincos(x,p,q) do { \
>          *(p) = sin(x); \
>          *(q) = cos(x); \
> } while (0)
>
>
> Instead of libm's sincos(). Might be a bug in there.
>

Using sin() and cos() directly as in

/* Double precision csinh() without using C's double complex.s */
void
dp_csinh(double x, double y, double *re, double *im)
{
   double c, s;
   *re = sinh(x) * cos(y);
   *im = cosh(x) * sin(y);
}

does not change the result.  I'll also note that libm
is compiled by clang, and I do not recompile it for the
tests.  Both gcc8 and cc are using the same libm.

I've also tested clang of amd64 with the -m32, it fails
as well.

--
Steve
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Optimization bug with floating-point?

Hans Petter Selasky-6
On 3/13/19 4:50 PM, Steve Kargl wrote:

> Using sin() and cos() directly as in
>
> /* Double precision csinh() without using C's double complex.s */
> void
> dp_csinh(double x, double y, double *re, double *im)
> {
>     double c, s;
>     *re = sinh(x) * cos(y);
>     *im = cosh(x) * sin(y);
> }
>
> does not change the result.  I'll also note that libm
> is compiled by clang, and I do not recompile it for the
> tests.  Both gcc8 and cc are using the same libm.
>
> I've also tested clang of amd64 with the -m32, it fails
> as well.

Hi,

I cannot see this is failing with 11-stable userland. Can you check with
objdump() that clang doesn't optimise it to sincos() ?

FreeBSD clang version 3.8.0 (tags/RELEASE_380/final 262564) (based on
LLVM 3.8.0)
Target: x86_64-unknown-freebsd11.0
Thread model: posix
InstalledDir: /usr/bin

cc -lm -O2 -Wall test.c && ./a.out
Max ULP: 2.297073
Count: 0

clang40 -lm -O2 test6.c
 > ./a.out
Max ULP: 2.297073
Count: 0

clang50 -lm -O2 test6.c
 > ./a.out
Max ULP: 2.297073
Count: 0

clang60 -lm -O2 test6.c
 > ./a.out
Max ULP: 2.297073
Count: 0

--HPS
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Optimization bug with floating-point?

Steve Kargl
On Wed, Mar 13, 2019 at 04:56:26PM +0100, Hans Petter Selasky wrote:

> On 3/13/19 4:50 PM, Steve Kargl wrote:
> > Using sin() and cos() directly as in
> >
> > /* Double precision csinh() without using C's double complex.s */
> > void
> > dp_csinh(double x, double y, double *re, double *im)
> > {
> >     double c, s;
> >     *re = sinh(x) * cos(y);
> >     *im = cosh(x) * sin(y);
> > }
> >
> > does not change the result.  I'll also note that libm
> > is compiled by clang, and I do not recompile it for the
> > tests.  Both gcc8 and cc are using the same libm.
> >
> > I've also tested clang of amd64 with the -m32, it fails
> > as well.
>
> Hi,
>
> I cannot see this is failing with 11-stable userland. Can you check with
> objdump() that clang doesn't optimise it to sincos() ?

It doesn't.

% nm z | grep sin
         U csinhf
00401360 T dp_csinh
         U sin
         U sinh

> FreeBSD clang version 3.8.0 (tags/RELEASE_380/final 262564) (based on
> LLVM 3.8.0)
> Target: x86_64-unknown-freebsd11.0

The test does not fail on x86_64 unless you add the -m32 option,
which forces i386 behavior.

cc --version
FreeBSD clang version 7.0.1 (tags/RELEASE_701/final 349250)
(based on LLVM 7.0.1)
Target: x86_64-unknown-freebsd13.0

cc -fno-builtin -O2 -o z a.c -lm && ./z
Max u: 2.297073
Count: 0

cc -fno-builtin -O2 -o z a.c -lm -m32 && ./z
Max u: 23.061242
Count: 39

> Thread model: posix
> InstalledDir: /usr/bin
>
> cc -lm -O2 -Wall test.c && ./a.out
> Max ULP: 2.297073
> Count: 0

add -m32.

>
> clang40 -lm -O2 test6.c
>  > ./a.out
> Max ULP: 2.297073
> Count: 0
>
> clang50 -lm -O2 test6.c
>  > ./a.out
> Max ULP: 2.297073
> Count: 0
>
> clang60 -lm -O2 test6.c
>  > ./a.out
> Max ULP: 2.297073
> Count: 0

--
Steve
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Optimization bug with floating-point?

John Baldwin
In reply to this post by Steve Kargl
On 3/13/19 9:40 AM, Steve Kargl wrote:

> On Wed, Mar 13, 2019 at 09:32:57AM -0700, John Baldwin wrote:
>> On 3/13/19 8:16 AM, Steve Kargl wrote:
>>> On Tue, Mar 12, 2019 at 07:45:41PM -0700, Steve Kargl wrote:
>>>>
>>>> gcc8 --version
>>>> gcc8 (FreeBSD Ports Collection) 8.3.0
>>>>
>>>> gcc8 -fno-builtin -o z a.c -lm && ./z
>>>> gcc8 -O -fno-builtin -o z a.c -lm && ./z
>>>> gcc8 -O2 -fno-builtin -o z a.c -lm && ./z
>>>> gcc8 -O3 -fno-builtin -o z a.c -lm && ./z
>>>>
>>>> Max ULP: 2.297073
>>>> Count: 0           (# of ULP that exceed 21)
>>>>
>>>
>>> clang agrees with gcc8 if one changes ...
>>>
>>>> int
>>>> main(void)
>>>> {
>>>>    double re, im, u, ur, ui;
>>>>    float complex f;
>>>>    float x, y;
>>>
>>> this line to "volatile float x, y".
>>
>> So it seems to be a regression in clang 7 vs clang 6?
>>
>
> /usr/local/bin/clang60 has the same problem.  
>
> % /usr/local/bin/clang60 -o z -O2 a.c -lm && ./z
>   Maximum ULP: 23.061242
> # of ULP > 21: 39
>
> Adding volatile as in the above "fixes" the problem.
>
> AFAICT, this a i386/387 code generation problem.  Perhaps,
> an alignment issue?

Oh, I misread your earlier e-mail to say that clang60 worked.

One issue I'm aware of is that clang does not have any support for the
special arrangement FreeBSD/i386 uses where it uses different precision
for registers vs in-memory for some of the floating point types (GCC has
a special hack that is only used on FreeBSD for this but isn't used on
any other OS's).  I wonder if that could be a factor?  Volatile probably
forces a round trip between memory which might explain why this is the
case.

I wonder what your test program does on i386 Linux with GCC?

--
John Baldwin
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Optimization bug with floating-point?

Conrad Meyer-2
Hi John,

On Wed, Mar 13, 2019 at 10:17 AM John Baldwin <[hidden email]> wrote:
> One issue I'm aware of is that clang does not have any support for the
> special arrangement FreeBSD/i386 uses where it uses different precision
> for registers vs in-memory for some of the floating point types (GCC has
> a special hack that is only used on FreeBSD for this but isn't used on
> any other OS's).  I wonder if that could be a factor?  Volatile probably
> forces a round trip between memory which might explain why this is the
> case.
>
> I wonder what your test program does on i386 Linux with GCC?

$ uname -sr
Linux 4.20.4
$ gcc --version
gcc (GCC) 8.2.1 20181215 (Red Hat 8.2.1-6)
...
$ rpm -qf /usr/lib/libm-2.27.so
glibc-2.27-37.fc28.i686

$ gcc -m32 -fno-builtin -o z kargl.c -lm && ./z
Max ULP: 1.959975
Count: 0
$ gcc -O -m32 -fno-builtin -o z kargl.c -lm && ./z
Max ULP: 1.959975
Count: 0
$ gcc -O1 -m32 -fno-builtin -o z kargl.c -lm && ./z
Max ULP: 1.959975
Count: 0
$ gcc -O2 -m32 -fno-builtin -o z kargl.c -lm && ./z
Max ULP: nan
Count: 0
$ gcc -O3 -m32 -fno-builtin -o z kargl.c -lm && ./z
Max ULP: nan
Count: 0

Uh.

kargl.c: In function ‘main’:
kargl.c:80:10: warning: ‘u’ may be used uninitialized in this function
[-Wmaybe-uninitialized]
       if (ur > u) u = ur;
          ^

If I initialize 'u' (to, e.g., -1e52), I get:
Max ULP: 1.959975
Count: 0

at -O2 and -O3 as well.

Best,
Conrad
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Optimization bug with floating-point?

Steve Kargl
In reply to this post by John Baldwin
On Wed, Mar 13, 2019 at 10:16:12AM -0700, John Baldwin wrote:

> On 3/13/19 9:40 AM, Steve Kargl wrote:
> > On Wed, Mar 13, 2019 at 09:32:57AM -0700, John Baldwin wrote:
> >> On 3/13/19 8:16 AM, Steve Kargl wrote:
> >>> On Tue, Mar 12, 2019 at 07:45:41PM -0700, Steve Kargl wrote:
> >>>>
> >>>> gcc8 --version
> >>>> gcc8 (FreeBSD Ports Collection) 8.3.0
> >>>>
> >>>> gcc8 -fno-builtin -o z a.c -lm && ./z
> >>>> gcc8 -O -fno-builtin -o z a.c -lm && ./z
> >>>> gcc8 -O2 -fno-builtin -o z a.c -lm && ./z
> >>>> gcc8 -O3 -fno-builtin -o z a.c -lm && ./z
> >>>>
> >>>> Max ULP: 2.297073
> >>>> Count: 0           (# of ULP that exceed 21)
> >>>>
> >>>
> >>> clang agrees with gcc8 if one changes ...
> >>>
> >>>> int
> >>>> main(void)
> >>>> {
> >>>>    double re, im, u, ur, ui;
> >>>>    float complex f;
> >>>>    float x, y;
> >>>
> >>> this line to "volatile float x, y".
> >>
> >> So it seems to be a regression in clang 7 vs clang 6?
> >>
> >
> > /usr/local/bin/clang60 has the same problem.  
> >
> > % /usr/local/bin/clang60 -o z -O2 a.c -lm && ./z
> >   Maximum ULP: 23.061242
> > # of ULP > 21: 39
> >
> > Adding volatile as in the above "fixes" the problem.
> >
> > AFAICT, this a i386/387 code generation problem.  Perhaps,
> > an alignment issue?
>
> Oh, I misread your earlier e-mail to say that clang60 worked.
>
> One issue I'm aware of is that clang does not have any support for the
> special arrangement FreeBSD/i386 uses where it uses different precision
> for registers vs in-memory for some of the floating point types (GCC has
> a special hack that is only used on FreeBSD for this but isn't used on
> any other OS's).  I wonder if that could be a factor?  Volatile probably
> forces a round trip between memory which might explain why this is the
> case.
>
> I wonder what your test program does on i386 Linux with GCC?

I don't have an i386 Linux environment.  I tried comparing the
assembly generated with and without volatile, but it proves
difficult as register numbers are changed between the 2 listings
so almost all lines mismatch

If I move ranged(), rangef(), dp_csinh(), and ulpfd() into b.c
so a.c only contains main(), add appropriate prototypes to a.c,
and comment out the printf() statements, I still see the problem.
Judging from the diff, there is a difference in the spills and
loads in 2 places.

% diff -uw without_volatile with_volatile
--- without_volatile 2019-03-13 10:51:33.244226000 -0700
+++ with_volatile 2019-03-13 10:51:54.088095000 -0700
@@ -35,11 +35,13 @@
  movl %esi, 68(%esp)          # 4-byte Spill
  calll rangef
  fadds .LCPI0_0
- fstpl 24(%esp)                # 8-byte Folded Spill
+ fstps 28(%esp)
  calll rangef
  fadds .LCPI0_1
- fstl 100(%esp)               # 8-byte Folded Spill
- fldl 24(%esp)                # 8-byte Folded Reload
+ fstps 24(%esp)
+ flds 28(%esp)
+ flds 24(%esp)
+ fxch %st(1)
  fstps 48(%esp)
  fstps 52(%esp)
  movl 48(%esp), %eax
@@ -49,13 +51,13 @@
  calll csinhf
  movl %eax, %esi
  movl %edx, %edi
+ flds 28(%esp)
+ flds 24(%esp)
  leal 72(%esp), %eax
  movl %eax, 20(%esp)
  leal 80(%esp), %eax
  movl %eax, 16(%esp)
- fldl 100(%esp)               # 8-byte Folded Reload
  fstpl 8(%esp)
- fldl 24(%esp)                # 8-byte Folded Reload
  fstpl (%esp)
  calll dp_csinh
  movl %esi, 40(%esp)
@@ -75,7 +77,7 @@
  fnstsw %ax
                                         # kill: def $ah killed $ah killed $ax
  sahf
- fstl 24(%esp)                # 8-byte Folded Spill
+ fstl 100(%esp)               # 8-byte Folded Spill
  ja .LBB0_3
 # %bb.2:                                # %for.body
                                         #   in Loop: Header=BB0_1 Depth=1
@@ -114,7 +116,7 @@
                                         #   in Loop: Header=BB0_1 Depth=1
  fstp %st(2)
  fldl 92(%esp)                # 8-byte Folded Reload
- fldl 24(%esp)                # 8-byte Folded Reload
+ fldl 100(%esp)               # 8-byte Folded Reload
  fucomp %st(1)
  fnstsw %ax
                                         # kill: def $ah killed $ah killed $ax

Adding ieeefp.h to a.c and fpsetprec(FP_PE) in main()
produces a massive diff, but still wrong results if
volatile is not use.

Clang appears to be broken for FP on i386/387.

--
Steve
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Optimization bug with floating-point?

Steve Kargl
In reply to this post by Conrad Meyer-2
On Wed, Mar 13, 2019 at 10:40:28AM -0700, Conrad Meyer wrote:

> Hi John,
>
> On Wed, Mar 13, 2019 at 10:17 AM John Baldwin <[hidden email]> wrote:
> > One issue I'm aware of is that clang does not have any support for the
> > special arrangement FreeBSD/i386 uses where it uses different precision
> > for registers vs in-memory for some of the floating point types (GCC has
> > a special hack that is only used on FreeBSD for this but isn't used on
> > any other OS's).  I wonder if that could be a factor?  Volatile probably
> > forces a round trip between memory which might explain why this is the
> > case.
> >
> > I wonder what your test program does on i386 Linux with GCC?
>
> $ uname -sr
> Linux 4.20.4
> $ gcc --version
> gcc (GCC) 8.2.1 20181215 (Red Hat 8.2.1-6)
> ...
> $ rpm -qf /usr/lib/libm-2.27.so
> glibc-2.27-37.fc28.i686
>
> $ gcc -m32 -fno-builtin -o z kargl.c -lm && ./z
> Max ULP: 1.959975
> Count: 0
> $ gcc -O -m32 -fno-builtin -o z kargl.c -lm && ./z
> Max ULP: 1.959975
> Count: 0
> $ gcc -O1 -m32 -fno-builtin -o z kargl.c -lm && ./z
> Max ULP: 1.959975
> Count: 0
> $ gcc -O2 -m32 -fno-builtin -o z kargl.c -lm && ./z
> Max ULP: nan
> Count: 0
> $ gcc -O3 -m32 -fno-builtin -o z kargl.c -lm && ./z
> Max ULP: nan
> Count: 0
>
> Uh.
>
> kargl.c: In function ‘main’:
> kargl.c:80:10: warning: ‘u’ may be used uninitialized in this function
> [-Wmaybe-uninitialized]
>        if (ur > u) u = ur;
>           ^

Whoops.  There are a number of variations on a theme named a.c.
Initializing u to 0 doesn't change the outcome with clang on
FreeBSD.

--
Steve
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Optimization bug with floating-point?

Steve Kargl
In reply to this post by John Baldwin
On Wed, Mar 13, 2019 at 10:16:12AM -0700, John Baldwin wrote:

> On 3/13/19 9:40 AM, Steve Kargl wrote:
> > On Wed, Mar 13, 2019 at 09:32:57AM -0700, John Baldwin wrote:
> >> On 3/13/19 8:16 AM, Steve Kargl wrote:
> >>> On Tue, Mar 12, 2019 at 07:45:41PM -0700, Steve Kargl wrote:
> >>>>
> >>>> gcc8 --version
> >>>> gcc8 (FreeBSD Ports Collection) 8.3.0
> >>>>
> >>>> gcc8 -fno-builtin -o z a.c -lm && ./z
> >>>> gcc8 -O -fno-builtin -o z a.c -lm && ./z
> >>>> gcc8 -O2 -fno-builtin -o z a.c -lm && ./z
> >>>> gcc8 -O3 -fno-builtin -o z a.c -lm && ./z
> >>>>
> >>>> Max ULP: 2.297073
> >>>> Count: 0           (# of ULP that exceed 21)
> >>>>
> >>>
> >>> clang agrees with gcc8 if one changes ...
> >>>
> >>>> int
> >>>> main(void)
> >>>> {
> >>>>    double re, im, u, ur, ui;
> >>>>    float complex f;
> >>>>    float x, y;
> >>>
> >>> this line to "volatile float x, y".
> >>
> >> So it seems to be a regression in clang 7 vs clang 6?
> >>
> >
> > /usr/local/bin/clang60 has the same problem.  
> >
> > % /usr/local/bin/clang60 -o z -O2 a.c -lm && ./z
> >   Maximum ULP: 23.061242
> > # of ULP > 21: 39
> >
> > Adding volatile as in the above "fixes" the problem.
> >
> > AFAICT, this a i386/387 code generation problem.  Perhaps,
> > an alignment issue?
>
> Oh, I misread your earlier e-mail to say that clang60 worked.
>
> One issue I'm aware of is that clang does not have any support for the
> special arrangement FreeBSD/i386 uses where it uses different precision
> for registers vs in-memory for some of the floating point types (GCC has
> a special hack that is only used on FreeBSD for this but isn't used on
> any other OS's).  I wonder if that could be a factor?  Volatile probably
> forces a round trip between memory which might explain why this is the
> case.
>

I went looking for this special hack.  In gcc/gccx/config/i386,
one finds

/* FreeBSD sets the rounding precision of the FPU to 53 bits.  Let the
   compiler get the contents of <float.h> and std::numeric_limits correct.  */
#undef TARGET_96_ROUND_53_LONG_DOUBLE
#define TARGET_96_ROUND_53_LONG_DOUBLE (!TARGET_64BIT)

So, taking this as a hunch, I added ieeefp.h to my test program
and called 'fpsetprec(FP_PD)' as the first executable statement.
This then results in

% cc -fno-builtin -m32 -O2 -o z b.o a.c -lm && ./z
Max u: 2.297073
Count: 0

So, is there a way to correctly build clang for i386/387
to automatically set the precision correctly?

--
Steve
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Optimization bug with floating-point?

Steve Kargl
On Wed, Mar 13, 2019 at 11:30:07PM -0700, Steve Kargl wrote:

>
> Spent a couple hours wandering in contrib/llvm.  Have no idea
> how to fix clang to actually work on i386/387.  Any ideas
> would be welcomed.
>
> AFAICT, all libm float routines need to be modified to conditional
> include ieeefp.h and call fpsetprec(FP_PD).  This will work around
> issues is FP and libm.  FreeBSD needs to issue an erratum about
> the numerical issues with clang.
>

Probably beating a dead horse, but I'll continue as someone
might actually be able to me fix clang.

clang has the ability to determine the default precision that
the FPU on i386 is using.

#include <err.h>
#include <ieeefp.h>
#include <stdio.h>
#include <stdlib.h>

int
main(void)
{

   fp_prec_t p;

   p = fpgetprec();

   switch(p) {
   case FP_PS:
      printf("24 bit (single-precision)\n");
      break;
   case FP_PRS:
      printf("reserved\n");
      break;
   case FP_PD:
      printf("53 bit (double-precision)\n");
      break;
   case FP_PE:
      printf("64 bit (extended-precision)\n");
      break;
   default:
      errx(1,"unable to determine precision");
   };

   return 0;
}

%  cc -o z -O2 d.c && ./z
53 bit (double-precision)

It is likely that one (or more files) in contrib/llvm/Target/X86
to be fixed.  Unfortunately, there are 116 files, which are written
in languages I do not know.

Any pointers of which file(s) to poke?

--
Steve
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Optimization bug with floating-point?

Konstantin Belousov
In reply to this post by Steve Kargl
On Fri, Mar 15, 2019 at 05:50:37AM +1100, Peter Jeremy wrote:
> On 2019-Mar-13 23:30:07 -0700, Steve Kargl <[hidden email]> wrote:
> >AFAICT, all libm float routines need to be modified to conditional
> >include ieeefp.h and call fpsetprec(FP_PD).  This will work around
> >issues is FP and libm.  FreeBSD needs to issue an erratum about
> >the numerical issues with clang.
>
> I vaguely recall looking into the x87 initialisation a long time ago
> and STR that the startup code (either crtX or in the kernel) does
> a fninit() to set the precision.  I don't recall exactly where.
At boot, a clean initial FPU state is stored in fpu_initialstate.
Then on first FPU access from userspace  (first for the given process
context), this saved state is copied into hardware registers.  The
quirk is that for i386 binaries on amd64, we adjust fpu control word
to what is expected by i386 binaries.

>
> IMO, calling fpsetprec() in every libm float function is overkill. It
> should be enough to fpsetprec() before main() and add a note in the
> man pages that libm is built to use the default FPU configuration and
> changing the configuration (precision or rounding) may result in larger
> errors.
Changing default precision in crt1 would break the ABI.
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Optimization bug with floating-point?

John Baldwin
On 3/14/19 12:20 PM, Konstantin Belousov wrote:

> On Fri, Mar 15, 2019 at 05:50:37AM +1100, Peter Jeremy wrote:
>> On 2019-Mar-13 23:30:07 -0700, Steve Kargl <[hidden email]> wrote:
>>> AFAICT, all libm float routines need to be modified to conditional
>>> include ieeefp.h and call fpsetprec(FP_PD).  This will work around
>>> issues is FP and libm.  FreeBSD needs to issue an erratum about
>>> the numerical issues with clang.
>>
>> I vaguely recall looking into the x87 initialisation a long time ago
>> and STR that the startup code (either crtX or in the kernel) does
>> a fninit() to set the precision.  I don't recall exactly where.
> At boot, a clean initial FPU state is stored in fpu_initialstate.
> Then on first FPU access from userspace  (first for the given process
> context), this saved state is copied into hardware registers.  The
> quirk is that for i386 binaries on amd64, we adjust fpu control word
> to what is expected by i386 binaries.
>
>>
>> IMO, calling fpsetprec() in every libm float function is overkill. It
>> should be enough to fpsetprec() before main() and add a note in the
>> man pages that libm is built to use the default FPU configuration and
>> changing the configuration (precision or rounding) may result in larger
>> errors.
> Changing default precision in crt1 would break the ABI.

So what I don't understand then is what is gcc doing different than clang
in this case.  I assume neither GCC _nor_ clang are adjusting the FPU in
compiler-generated code, and in fact as Steve's earlier tests shows, the
precision is set to PD by default when a clang-built binary is run.

--
John Baldwin
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Optimization bug with floating-point?

Steve Kargl
In reply to this post by Steve Kargl
On Fri, Mar 15, 2019 at 05:50:37AM +1100, Peter Jeremy wrote:

> On 2019-Mar-13 23:30:07 -0700, Steve Kargl <[hidden email]> wrote:
> >AFAICT, all libm float routines need to be modified to conditional
> >include ieeefp.h and call fpsetprec(FP_PD).  This will work around
> >issues is FP and libm.  FreeBSD needs to issue an erratum about
> >the numerical issues with clang.
>
> I vaguely recall looking into the x87 initialisation a long time ago
> and STR that the startup code (either crtX or in the kernel) does
> a fninit() to set the precision.  I don't recall exactly where.
>
> IMO, calling fpsetprec() in every libm float function is overkill. It
> should be enough to fpsetprec() before main() and add a note in the
> man pages that libm is built to use the default FPU configuration and
> changing the configuration (precision or rounding) may result in larger
> errors.

My understanding of the situation is that FreeBSD i386/387 sets
the FPU to 53-bit precision (whether at start up or first access
is immaterial).  This was done long ago to prevent issues with
different optimization levels leaving different intermediate
results is registers with extended precision.  You can observe
the problem with the toy program I posted and clang.  Compile it
with -O0 and -O2.  With the former you have max ULP of 2.9 (the
desired result); with the latter you have a max ULP of 23.xxx.
I have observed a 6 billion ULP issue when running my testsuite.
As pointed out by John Baldwin, GCC is aware of the FPU setting.
The problem with clang is that it seems to unconditionally assume
the FPU is set to 64-bit precision.   It is unclear if clang is
generated the desired result for float routines in libm.  The
only to gaurantee the desired resut is to use fpsetprec(FP_PD),
or fix clang to take into account the FPU environment.

--
Steve
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Optimization bug with floating-point?

Konstantin Belousov
In reply to this post by John Baldwin
On Thu, Mar 14, 2019 at 12:59:14PM -0700, John Baldwin wrote:

> On 3/14/19 12:20 PM, Konstantin Belousov wrote:
> > On Fri, Mar 15, 2019 at 05:50:37AM +1100, Peter Jeremy wrote:
> >> On 2019-Mar-13 23:30:07 -0700, Steve Kargl <[hidden email]> wrote:
> >>> AFAICT, all libm float routines need to be modified to conditional
> >>> include ieeefp.h and call fpsetprec(FP_PD).  This will work around
> >>> issues is FP and libm.  FreeBSD needs to issue an erratum about
> >>> the numerical issues with clang.
> >>
> >> I vaguely recall looking into the x87 initialisation a long time ago
> >> and STR that the startup code (either crtX or in the kernel) does
> >> a fninit() to set the precision.  I don't recall exactly where.
> > At boot, a clean initial FPU state is stored in fpu_initialstate.
> > Then on first FPU access from userspace  (first for the given process
> > context), this saved state is copied into hardware registers.  The
> > quirk is that for i386 binaries on amd64, we adjust fpu control word
> > to what is expected by i386 binaries.
> >
> >>
> >> IMO, calling fpsetprec() in every libm float function is overkill. It
> >> should be enough to fpsetprec() before main() and add a note in the
> >> man pages that libm is built to use the default FPU configuration and
> >> changing the configuration (precision or rounding) may result in larger
> >> errors.
> > Changing default precision in crt1 would break the ABI.
>
> So what I don't understand then is what is gcc doing different than clang
> in this case.  I assume neither GCC _nor_ clang are adjusting the FPU in
> compiler-generated code, and in fact as Steve's earlier tests shows, the
> precision is set to PD by default when a clang-built binary is run.

Precision control only affect elementary floating-point instructions.
Could this be the cause ?

SDM vol 1 8.1.5.2 Precision Control Field
The precision-control bits only affect the results of the following
floating-point instructions: FADD, FADDP, FIADD, FSUB, FSUBP, FISUB,
FSUBR, FSUBRP, FISUBR, FMUL, FMULP, FIMUL, FDIV, FDIVP, FIDIV, FDIVR,
FDIVRP, FIDIVR, and FSQRT.
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Optimization bug with floating-point?

John Baldwin
In reply to this post by Steve Kargl
On 3/14/19 1:08 PM, Steve Kargl wrote:

> On Fri, Mar 15, 2019 at 05:50:37AM +1100, Peter Jeremy wrote:
>> On 2019-Mar-13 23:30:07 -0700, Steve Kargl <[hidden email]> wrote:
>>> AFAICT, all libm float routines need to be modified to conditional
>>> include ieeefp.h and call fpsetprec(FP_PD).  This will work around
>>> issues is FP and libm.  FreeBSD needs to issue an erratum about
>>> the numerical issues with clang.
>>
>> I vaguely recall looking into the x87 initialisation a long time ago
>> and STR that the startup code (either crtX or in the kernel) does
>> a fninit() to set the precision.  I don't recall exactly where.
>>
>> IMO, calling fpsetprec() in every libm float function is overkill. It
>> should be enough to fpsetprec() before main() and add a note in the
>> man pages that libm is built to use the default FPU configuration and
>> changing the configuration (precision or rounding) may result in larger
>> errors.
>
> My understanding of the situation is that FreeBSD i386/387 sets
> the FPU to 53-bit precision (whether at start up or first access
> is immaterial).  This was done long ago to prevent issues with
> different optimization levels leaving different intermediate
> results is registers with extended precision.  You can observe
> the problem with the toy program I posted and clang.  Compile it
> with -O0 and -O2.  With the former you have max ULP of 2.9 (the
> desired result); with the latter you have a max ULP of 23.xxx.
> I have observed a 6 billion ULP issue when running my testsuite.
> As pointed out by John Baldwin, GCC is aware of the FPU setting.
> The problem with clang is that it seems to unconditionally assume
> the FPU is set to 64-bit precision.   It is unclear if clang is
> generated the desired result for float routines in libm.  The
> only to gaurantee the desired resut is to use fpsetprec(FP_PD),
> or fix clang to take into account the FPU environment.

OTOH, note that every other OS in 32-bit mode uses 64-bit precision,
and amd64 also uses 64-bit precision by default IIUC.  FreeBSD/i386
is definitely unique in this regard.  Linux doesn't do it, none of
the other BSD's do it (only Dragonfly does b/c they inherited it
from FreeBSD).  None of Solaris, Windows, etc. do it either if the
gcc sources are to be trusted as a reference.

That said, I think it must have to do with how clang vs GCC is
handling saving the values in memory and whether or not it does
truncation to 53 bits when stored in memory somehow.  I was trying
to poke around in GCC's sources to figure out if it was doing anything
differently, but I couldn't find a difference in terms of function
pointers, etc.  The only difference is is the constants used in a set
of structures.  I haven't tried to track down what those struct
member values control though.

--
John Baldwin
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain
To unsubscribe, send any mail to "[hidden email]"
Reply | Threaded
Open this post in threaded view
|

Re: Optimization bug with floating-point?

Steve Kargl
In reply to this post by Steve Kargl
On Tue, Mar 12, 2019 at 07:45:41PM -0700, Steve Kargl wrote:
> All,
>
> There seems to an optimization bug with clang on
>
> % uname -a
> FreeBSD mobile 13.0-CURRENT FreeBSD 13.0-CURRENT r344653 MOBILE  i386
>
> IOW, if you do numerica work on i386, you may want to check your
> results.

This is now

https://bugs.llvm.org/show_bug.cgi?id=41224

--
Steve
_______________________________________________
[hidden email] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain
To unsubscribe, send any mail to "[hidden email]"
12