Re: Re: Misaligned BufferDescriptors causing major performance problems on AMD

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Misaligned BufferDescriptors causing major performance problems on AMD
Date: 2014-02-03 23:38:19
Message-ID: 20140203233819.GA12016@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2014-02-03 15:17:13 -0800, Peter Geoghegan wrote:
> On Sun, Feb 2, 2014 at 7:13 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> > Just as reference, we're talking about a performance degradation from
> > 475963.613865 tps to 197744.913556 in a pgbench -S -cj64 just by setting
> > max_connections to 90, from 91...
>
> That's pretty terrible.

Yea, I was scared myself.

> > A quick hack (attached) making BufferDescriptor 64byte aligned indeed
> > restored performance across all max_connections settings. It's not
> > surprising that a misaligned buffer descriptor causes problems -
> > there'll be plenty of false sharing of the spinlocks otherwise. Curious
> > that the the intel machine isn't hurt much by this.
>
> I think that is explained here:
>
> http://www.agner.org/optimize/blog/read.php?i=142&v=t
>
> With Sandy Bridge, "Misaligned memory operands [are] handled efficiently".

No, I don't think so. Those improvements afair refer to unaligned
accesses as in accessing a 4 byte variable at address % 4 != 0.

> > Now all this hinges on the fact that by a mere accident
> > BufferDescriptors are 64byte in size:
>
> Are they 64 bytes in size on REL9_*_STABLE? How about on win64? I
> think we're reasonably disciplined here already, but long is 32-bits
> in length even on win64. Looks like it would probably be okay, but as
> you say, it doesn't seem like something to leave to chance.

I haven't checked any branches yet, but I don't remember any recent
changes to sbufdesc. And I think we're lucky enough that all the used
types are the same width across common 64bit OSs.
But I absolutely agree we shouldn't leave this to chance.

> > We could polish up the attached patch and apply it to all the branches,
> > the costs of memory are minimal. But I wonder if we shouldn't instead
> > make ShmemInitStruct() always return cacheline aligned addresses. That
> > will require some fiddling, but it might be a good idea nonetheless?

> What fiddling are you thinking of?

Basically always doing a TYPEALIGN(CACHELINE_SIZE, addr) before
returning from ShmemAlloc() (and thereby ShmemInitStruct). After I'd
written the email I came across an interesting bit of code:

void *
ShmemAlloc(Size size)
{
...
/* extra alignment for large requests, since they are probably buffers */
if (size >= BLCKSZ)
newStart = BUFFERALIGN(newStart);

/*
* Preferred alignment for disk I/O buffers. On some CPUs, copies between
* user space and kernel space are significantly faster if the user buffer
* is aligned on a larger-than-MAXALIGN boundary. Ideally this should be
* a platform-dependent value, but for now we just hard-wire it.
*/
#define ALIGNOF_BUFFER 32

#define BUFFERALIGN(LEN) TYPEALIGN(ALIGNOF_BUFFER, (LEN))

So, we're already trying to make large allocations (which we'll be using
here) try to fit to a 32 byte alignment. Which unfortunately isn't
enough here, but it explains why max_connections has to be changed by
exactly 1 to achieve adequate performance...
So, the absolutely easiest thing would be to just change ALIGNOF_BUFFER
to 64. The current value was accurate in 2003 (common cacheline size
back then), but it really isn't anymore today. Anything relevant is
64byte.

But I really think we should also remove the BLCKSZ limit. There's
relatively few granular/dynamic usages of ShmemAlloc(), basically just
shared memory hashes. And I'd be rather unsurprised if aligning all
allocations for hashes resulted in a noticeable speedup. We have
frightening bottlenecks in those today.

> > I think we should also consider some more reliable measures to have
> > BufferDescriptors cacheline sized, rather than relying on the happy
> > accident. Debugging alignment issues isn't fun, too much of a guessing
> > game...
>
> +1. Maybe make code that isn't appropriately aligned fail to compile?

I was wondering about using some union trickery to make sure the array
only consists out of properly aligned types. That'd require some changes
but luckily code directly accessing the BufferDescriptors array isn't
too far spread.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2014-02-03 23:55:28 Re: [COMMITTERS] pgsql: Include planning time in EXPLAIN ANALYZE output.
Previous Message Peter Geoghegan 2014-02-03 23:30:55 Re: [COMMITTERS] pgsql: Include planning time in EXPLAIN ANALYZE output.