Re: Crash with old Windows on new CPU

From: Christian Ullrich <chris(at)chrullrich(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Crash with old Windows on new CPU
Date: 2016-02-13 22:09:28
Message-ID: AM2PR06MB0690415F667B2A8864CEE893D4AA0@AM2PR06MB0690.eurprd06.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

* From: Christian Ullrich

> On February 13, 2016 4:10:34 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> > Christian Ullrich <chris(at)chrullrich(dot)net> writes:

> > Lastly, I'd like to see some discussion of what side effects
> > "_set_FMA3_enable(0);" has ... I rather doubt that it's really
> > a magic-elixir-against-crashes-with-no-downsides.
>
> It tells the math library (in the CRT, no separate libm on Windows)
> not to use the AVX2-based implementations of log() and possibly
> other functions. AIUI, FMA means "fused multiply-add" and is
> apparently something that increases performance and accuracy in
> transcendental functions.
>
> I can check the CRT source later today and figure out exactly what
> it does.

OK, it turns out that the CRT source MS ships is not quite as complete as I thought it was (up until 2013, at least), so I had a look at the disassembly. When the library initializes, it checks whether the CPU supports the FMA instructions by looking at a certain bit in the CPUID result. If that is set, it sets a flag to use the FMA instructions. Later, in exp(), log(), pow() and the trigonometrical functions, it first checks whether that flag is set, and if so, uses the AVX-based implementation. If the flag is not set, it falls back to an SSE2-based one. So, yes, that function only and specifically disables the use of instructions that do not work in the problematic case.

The bug appears to be that it uses all manner of AVX and AVX2 instructions based only on the FMA support flag in CPUID, even though AVX2 has its own bit there.

To reiterate: The problem occurs because the library only asks the CPU whether it is *able* to perform the AVX instructions, but not whether it is *willing* to do so. In this particular situation, the former applies but not the latter, because the CPU needs OS support (saving the XMM/YMM registers across context switches), and the OS has not declared its support for that.

The downside to disabling the AVX implementations is a performance loss compared to using it. I ran a microbenchmark (avg(log(x) from generate_series(1,1e8))), and the result was that with FMA enabled, it is ~5.5% faster than without.

--
Christian

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2016-02-13 23:19:01 Re: extend pgbench expressions with functions
Previous Message Yury Zhuravlev 2016-02-13 19:28:19 Re: Crash with old Windows on new CPU