Re: WIP: dynahash replacement for buffer table

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: dynahash replacement for buffer table
Date: 2014-10-16 07:11:34
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-10-14 17:53:10 -0400, Robert Haas wrote:
> On Tue, Oct 14, 2014 at 4:42 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> >> The code in CHashSearch shows the problem there: you need to STORE the
> >> hazard pointer before you begin to do the LOAD operations to scan the
> >> bucket, and you must finish all of those LOADs before you STORE the
> >> NULL hazard pointer. A read or write barrier won't provide store/load
> >> or load/store ordering.
> >
> > I'm not sure I understand the problem here - but a read + write barrier
> > should suffice? To avoid falling back to two full barriers, we'd need a
> > separate define pg_read_write_barrier(), but that's not a problem. IIUC
> > that should allow us to avoid emitting any actual code on x86.
> Well, thinking about x86 specifically, it definitely needs at least
> one mfence, after setting the hazard pointer and before beginning to
> read the bucket chain. It probably doesn't need the second mfence,
> before clearing the the hazard pointer, because x86 allows loads to be
> reordered before stores but not stores before loads. We could
> certainly try removing the second fence on x86 for benchmarking
> purposes, and we could also check whether mfence outperforms lock;
> addl in that scenario.

Hm. So I thought about this for a while this morning after I wasn't able
to unprogram my hotel room's alarm clock that a previous occupant set
way to early. Given that premise, take the following with a grain of

The reason for neading an mfence is that a read/write barrier doesn't
guarantee that the store is visible to other processes - just in which
order they become visible. Right? And that's essentially because the
write might sit in the process's store buffer.

So, how about *not* using a full barrier on the reader side (i.e. the
side that does the hazard pointer writes). But instead do something like
a atomic_fetch_add(hazard_ptr, 0) (i.e. lock; xaddl) on the side that
needs to scan the hazard pointers. That, combined with the write memory
barrier, should guarantee that the store buffers are emptied. Which is
pretty nice, because it moves the overhead to the rather infrequent
scanning of the hazard pointers - which needs to do lots of other atomic
ops anyway.

Sounds sane?

That's something that should best be simulated in an exhaustive x86
simulator, but it sounds sane - and it'd be quite awesome imo :)


Andres Freund

Andres Freund
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2014-10-16 07:39:37 Re: inherit support for foreign tables
Previous Message Michael Paquier 2014-10-16 07:01:12 Re: Add CREATE support to event triggers