Re: WIP: dynahash replacement for buffer table

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: dynahash replacement for buffer table
Date: 2014-10-14 15:19:16
Message-ID: CA+TgmoYfe+SK6h=d_EWAzxznBjh=b_SO8ogkY9Qd9S=LSsXC_g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Oct 14, 2014 at 10:25 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>> The key idea here is that lookups are done without any locks, only
>> memory barriers; and inserts and deletes are done using atomic ops.
>
> Hm. I quickly looked and I see that you use two full barriers for every
> lookup. That's pretty expensive. I think this should be doable using
> only read/write barriers on the lookup side? Even on architectures where
> read/write memory barriers have to be something but a compiler barrier,
> they're still usually a magnitude or so cheaper than full barriers.

The code in CHashSearch shows the problem there: you need to STORE the
hazard pointer before you begin to do the LOAD operations to scan the
bucket, and you must finish all of those LOADs before you STORE the
NULL hazard pointer. A read or write barrier won't provide store/load
or load/store ordering.

> I don't see much reason to strive for full lock-free ness. If the locks
> aren't noticeable in real world scenarios - who cares?

That's my feeling too. ISTR that when I stress-tested the hash table
back in 2012 when I started this code, the concurrency was far better
than dynahash with 16 lwlock partitions. I don't remember off hand
exactly how I tested that, but it was with the code in hashtest.c. I
also seem to recall that it was possible to make the freelists get
VERY hot, but of course that was a workload where hash table updates
were the only thing happening, so I assume that on a workload where
we're actually doing work (like, copying the data in and out of kernel
buffers) that's not going to be a real problem unless you have
thousands of cores. But we'll see.

> With regard for using a hash table for the buffer mapping lock I'm
> doubtful that any form of separate chaining is the right one. We
> currently have a quite noticeable problem with the number of cache
> misses in the buffer mapping hash (and also the heavyweight lock mgr) -
> if we stay with hashes that's only going to be fundamentally lower than
> dynahash if we change the type of hashing. I've had good, *preliminary*,
> results using an open addressing + linear probing approach.

I'm very skeptical of open addressing + linear probing. Such hash
tables tend to degrade over time, because you have to leave tombstones
behind to ensure that future scans advance far enough. There's really
no way to recover without rebuilding the whole hash table, and
eventually it degrades to linear search. If we're spending too much
time walking hash chains, I think the solution is to increase the
number of buckets so that the chains get shorter.

> My guess is that the additional indirection via the arena explains the
> difference in cache misses? But I might be completely off here.

The arena makes the calculation of the pointer address less
predictable, which might make things more difficult for the CPU
pipeline. But aside from that, I think it's the same basic idea: you
bounce from some sort of bucket header to the first element of the
chain and then, hopefully, you're done. Am I missing something?

> It'd be quite interesting to see a perf stat -ddd of dynahash.c/chash
> employing builds for some comparable load.

I'm hoping you can test this out on x86. All I have to work with are
the POWER machines, and the characteristics of those are somewhat
different. I can test it there, though.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2014-10-14 15:31:20 Re: WIP: dynahash replacement for buffer table
Previous Message Sawada Masahiko 2014-10-14 15:16:28 Re: Drop any statistics of table after it's truncated