Misaligned BufferDescriptors causing major performance problems on AMD

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Misaligned BufferDescriptors causing major performance problems on AMD
Date: 2014-02-02 15:13:19
Message-ID: 20140202151319.GD32123@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

In the nearby thread at
http://archives.postgresql.org/message-id/20140202140014.GM5930%40awork2.anarazel.de
Peter and I discovered that there is a large performance difference
between different max_connections on a larger machine (4x Opteron 6272,
64 cores together) in a readonly pgbench tests...

Just as reference, we're talking about a performance degradation from
475963.613865 tps to 197744.913556 in a pgbench -S -cj64 just by setting
max_connections to 90, from 91...

On 2014-02-02 15:00:14 +0100, Andres Freund wrote:
> On 2014-02-01 19:47:29 -0800, Peter Geoghegan wrote:
> > Here are the results of a benchmark on Nathan Boley's 64-core, 4
> > socket server: http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/amd-4-socket-rwlocks/
>
> That's interesting. The maximum number of what you see here (~293125)
> is markedly lower than what I can get.
>
> ... poke around ...
>
> Hm, that's partially because you're using pgbench without -M prepared if
> I see that correctly. The bottleneck in that case is primarily memory
> allocation. But even after that I am getting higher
> numbers: ~342497.
>
> Trying to nail down the differnce it oddly seems to be your
> max_connections=80 vs my 100. The profile in both cases is markedly
> different, way much more spinlock contention with 80. All in
> Pin/UnpinBuffer().
>
> I think =80 has to lead to some data being badly aligned. I can
> reproduce that =91 has *much* better performance than =90. 170841.844938
> vs 368490.268577 in a 10s test. Reproducable both with an without the test.
> That's certainly worth some investigation.
> This is *not* reproducable on the intel machine, so it might the
> associativity of the L1/L2 cache on the AMD.

So, I looked into this, and I am fairly certain it's because of the
(mis-)alignment of the buffer descriptors. With certain max_connections
settings InitBufferPool() happens to get 64byte aligned addresses, with
others not. I checked the alignment with gdb to confirm that.

A quick hack (attached) making BufferDescriptor 64byte aligned indeed
restored performance across all max_connections settings. It's not
surprising that a misaligned buffer descriptor causes problems -
there'll be plenty of false sharing of the spinlocks otherwise. Curious
that the the intel machine isn't hurt much by this.

Now all this hinges on the fact that by a mere accident
BufferDescriptors are 64byte in size:
struct sbufdesc {
BufferTag tag; /* 0 20 */
BufFlags flags; /* 20 2 */
uint16 usage_count; /* 22 2 */
unsigned int refcount; /* 24 4 */
int wait_backend_pid; /* 28 4 */
slock_t buf_hdr_lock; /* 32 1 */

/* XXX 3 bytes hole, try to pack */

int buf_id; /* 36 4 */
int freeNext; /* 40 4 */

/* XXX 4 bytes hole, try to pack */

LWLock * io_in_progress_lock; /* 48 8 */
LWLock * content_lock; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */

/* size: 64, cachelines: 1, members: 10 */
/* sum members: 57, holes: 2, sum holes: 7 */
};

We could polish up the attached patch and apply it to all the branches,
the costs of memory are minimal. But I wonder if we shouldn't instead
make ShmemInitStruct() always return cacheline aligned addresses. That
will require some fiddling, but it might be a good idea nonetheless?

I think we should also consider some more reliable measures to have
BufferDescriptors cacheline sized, rather than relying on the happy
accident. Debugging alignment issues isn't fun, too much of a guessing
game...

Thoughts?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
keep-BufferDescriptors-cacheline-aligned.patch text/x-patch 1.0 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2014-02-02 15:18:07 Re: [9.3 bug] disk space in pg_xlog increases during archive recovery
Previous Message Fujii Masao 2014-02-02 14:50:40 Re: [9.3 bug] disk space in pg_xlog increases during archive recovery