Re: what to revert

From: Kevin Grittner <kgrittn(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: what to revert
Date: 2016-05-10 19:41:49
Message-ID: CACjxUsMMew5_VefF09=Nz2D+6iUYvo=uhesLhZ5L+3WRf8v7Rg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, May 10, 2016 at 11:13 AM, Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> On 05/10/2016 10:29 AM, Kevin Grittner wrote:
>> On Mon, May 9, 2016 at 9:01 PM, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:

>>> * It's also seems to me the feature greatly amplifies the
>>> variability of the results, somehow. It's not uncommon to see
>>> results like this:
>>>
>>> master-10-new-2 235516 331976 133316 155563 133396
>>>
>>> where after the first runs (already fairly variable) the
>>> performance tanks to ~50%. This happens particularly with higher
>>> client counts, otherwise the max-min is within ~5% of the max.
>>> There are a few cases where this happens without the feature
>>> (i.e. old master, reverted or disabled), but it's usually much
>>> smaller than with it enabled (immediate, 10 or 60). See the
>>> 'summary' sheet in the ODS spreadsheet.

Just to quantify that with standard deviations:

standard deviation - revert
scale 1 16 32 64 128
100 386 1874 3661 8100 26587
3000 609 2236 4570 8974 41004
10000 257 4356 1350 891 12909

standard deviation - disabled
scale 1 16 32 64 128
100 641 1924 2983 12575 9411
3000 206 2321 5477 2380 45779
10000 2236 10376 11439 9653 10436

>>> I don't know what's the problem here - at first I thought that
>>> maybe something else was running on the machine, or that
>>> anti-wraparound autovacuum kicked in, but that seems not to be
>>> the case. There's nothing like that in the postgres log (also
>>> included in the .tgz).
>>
>> I'm inclined to suspect NUMA effects. It would be interesting to
>> try with the NUMA patch and cpuset I submitted a while back or with
>> fixes in place for the Linux scheduler bugs which were reported
>> last month. Which kernel version was this?
>
> I can try that, sure. Can you point me to the last versions of the
> patches, possibly rebased to current master if needed?

The initial thread (for explanation and discussion context) for my
attempt to do something about some NUMA problems I ran into is at:

http://www.postgresql.org/message-id/flat/1402267501(dot)41111(dot)YahooMailNeo(at)web122304(dot)mail(dot)ne1(dot)yahoo(dot)com

Note that in my tests at the time, the cpuset configuration made a
bigger difference than the patch, and both together typically only
made about a 2% difference in the NUMA test environment I was
using. I would sometimes see a difference as big as 20%, but had
no idea how to repeat that.

> The kernel is 3.19.0-031900-generic

So that kernel is recent enough to have acquired the worst of the
scheduling bugs, known to slow down one NASA high-concurrency
benchmark by 138x. To quote from the recent paper by Lozi, et
al[1]:

| The Missing Scheduling Domains bug causes all threads of the
| applications to run on a single node instead of eight. In some
| cases, the performance impact is greater than the 8x slowdown
| that one would expect, given that the threads are getting 8x less
| CPU time than they would without the bug (they run on one node
| instead of eight). lu, for example, runs 138x faster!
| Super-linear slowdowns occur in cases where threads
| frequently synchronize using locks or barriers: if threads spin
| on a lock held by a descheduled thread, they will waste even more
| CPU time, causing cascading effects on the entire application’s
| performance. Some applications do not scale ideally to 64 cores
| and are thus a bit less impacted by the bug. The minimum slowdown
| is 4x.

The bug is only encountered if cores are disabled and re-enabled,
though, and I have no idea whether that might have happened on your
machine. Since you're on a vulnerable kernel version, you might
want to be aware of the issue and take care not to trigger the
problem.

You are only vulnerable to the Group Imbalance bug if you use
autogroups. You are only vulnerable to the Scheduling Group
Construction bug if you have more than one hop from any core to any
memory segment (which seems quite unlikely with 4 sockets and 4
memory nodes).

If you are vulnerable to any of the above, it might explain some of
the odd variations. Let me know and I'll see if I can find more on
workarounds or OS patches.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1] Jean-Pierre Lozi, Baptiste Lepers, Justin Funston, Fabien Gaud,
Vivien Quéma, Alexandra Fedorova. The Linux Scheduler: a Decade
of Wasted Cores. In Proceedings of the 11th European
Conference on Computer Systems, EuroSys’16. April, 2016,
London, UK.
http://www.ece.ubc.ca/~sasha/papers/eurosys16-final29.pdf

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Konstantin Knizhnik 2016-05-10 19:42:18 Re: asynchronous and vectorized execution
Previous Message Simon Riggs 2016-05-10 19:15:47 Re: HeapTupleSatisfiesToast() busted? (was atomic pin/unpin causing errors)