Re: WIP: dynahash replacement for buffer table

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: dynahash replacement for buffer table
Date: 2014-10-17 00:22:24
Message-ID: CA+Tgmoa80iLreNDPhFVu856dcivsWF9x2sUcgNQ6Uy=PS56rWQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 16, 2014 at 6:53 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> When using shared_buffers = 96GB there's a performance benefit, but not
> huge:
> master (f630b0dd5ea2de52972d456f5978a012436115e): 153621.8
> master + LW_SHARED + lockless StrategyGetBuffer(): 477118.4
> master + LW_SHARED + lockless StrategyGetBuffer() + chash: 496788.6
> master + LW_SHARED + lockless StrategyGetBuffer() + chash-nomb: 499562.7
>
> But with shared_buffers = 16GB:
> master (f630b0dd5ea2de52972d456f5978a012436115e): 177302.9
> master + LW_SHARED + lockless StrategyGetBuffer(): 206172.4
> master + LW_SHARED + lockless StrategyGetBuffer() + chash: 413344.1
> master + LW_SHARED + lockless StrategyGetBuffer() + chash-nomb: 426405.8

Very interesting. This doesn't show that chash is the right solution,
but it definitely shows that doing nothing is the wrong solution. It
shows that, even with the recent bump to 128 lock manager partitions,
and LW_SHARED on top of that, workloads that actually update the
buffer mapping table still produce a lot of contention there. This
hasn't been obvious to me from profiling, but the numbers above make
it pretty clear.

It also seems to suggest that trying to get rid of the memory barriers
isn't a very useful optimization project. We might get a couple of
percent out of it, but it's pretty small potatoes, so unless it can be
done more easily than I suspect, it's probably not worth bothering
with. An approach I think might have more promise is to have bufmgr.c
call the CHash stuff directly instead of going through buf_table.c.
Right now, for example, BufferAlloc() creates and initializes a
BufferTag and passes a pointer to that buffer tag to BufTableLookup,
which copies it into a BufferLookupEnt. But it would be just as easy
for BufferAlloc() to put the BufferLookupEnt on its own stack, and
then you wouldn't need to copy the data an extra time. Now a 20-byte
copy isn't a lot, but it's completely unnecessary and looks easy to
get rid of.

> I had to play with setting max_connections+1 sometimes to get halfway
> comparable results for master - unaligned data otherwise causes wierd
> results otherwise. Without doing that the performance gap between master
> 96/16G was even bigger. We really need to fix that...
>
> This is pretty awesome.

Thanks. I wasn't quite sure how to test this or where the workloads
that it would benefit would be found, so I appreciate you putting time
into it. And I'm really glad to hear that it delivers good results.

I think it would be useful to plumb the chash statistics into the
stats collector or at least a debugging dump of some kind for testing.
They include a number of useful contention measures, and I'd be
interested to see what those look like on this workload. (If we're
really desperate for every last ounce of performance, we could also
disable those statistics in production builds. That's probably worth
testing at least once to see if it matters much, but I kind of hope it
doesn't.)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Caleb Welton 2014-10-17 01:21:01 Issue with mkdtemp() in port.h
Previous Message Andres Freund 2014-10-16 22:53:57 Re: WIP: dynahash replacement for buffer table