Re: Re: Misaligned BufferDescriptors causing major performance problems on AMD

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Misaligned BufferDescriptors causing major performance problems on AMD
Date: 2014-02-05 08:10:30
Message-ID: 20140205081030.GA3733@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2014-02-04 16:24:02 -0800, Peter Geoghegan wrote:
> On Mon, Feb 3, 2014 at 3:38 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> >> > A quick hack (attached) making BufferDescriptor 64byte aligned indeed
> >> > restored performance across all max_connections settings. It's not
> >> > surprising that a misaligned buffer descriptor causes problems -
> >> > there'll be plenty of false sharing of the spinlocks otherwise. Curious
> >> > that the the intel machine isn't hurt much by this.
>
> >> What fiddling are you thinking of?
> >
> > Basically always doing a TYPEALIGN(CACHELINE_SIZE, addr) before
> > returning from ShmemAlloc() (and thereby ShmemInitStruct).
>
> There is something you have not drawn explicit attention to that is
> very interesting. If we take REL9_3_STABLE tip to be representative
> (built with full -O2 optimization, no assertions just debugging
> symbols), setting max_connections to 91 from 90 does not have the
> effect of making the BufferDescriptors array aligned; it has the
> effect of making it *misaligned*. You reported that 91 was much better
> than 90. I think that the problem actually occurs when the array *is*
> aligned!

I don't think you can learn much from the alignment in 9.3
vs. HEAD. Loads has changed since, most prominently and recently
Robert's LWLock work. That certainly has changed allocation patterns.
It will also depend on some other parameters, e.g. changing
max_wal_senders, max_background_workers will also change the
offset. It's not that 91 is intrinsically better, it just happened to
give a aligned BufferDescriptors array when the other parameters weren't
changed at the same time.

> I suspect that the scenario described in this article accounts for the
> quite noticeable effect reported: http://danluu.com/3c-conflict

I don't think that's applicable here. What's described there is relevant
for access patterns that are larger multiple of the cacheline size - but
our's is exactly cacheline sized. What can happen in such scenarios is
that all your accesses map to the same set of cachelines, so instead of
using most of the cache, you end up using only 8 or so (8 is a common
size of set associative caches these days).
Theoretically we could see something like that for shared_buffers
itself, but I *think* our accesses are too far spread around in them for
that to be a significant issue.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2014-02-05 08:44:20 Re: CacheInvalidateRelcache in btree is a crummy idea
Previous Message Peter Geoghegan 2014-02-05 08:06:33 Re: Failure while inserting parent tuple to B-tree is not fun