Re: Reducing the size of BufferTag & remodeling forks

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Reducing the size of BufferTag & remodeling forks
Date: 2015-09-13 13:16:32
Message-ID: CANP8+jLT+zQdj=6T2fsBNFj_wvqDdPvJ7fkTg++_RcOx5QoCEw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12 September 2015 at 21:19, Andres Freund <andres(at)anarazel(dot)de> wrote:

> On 2015-09-12 13:12:26 +0100, Simon Riggs wrote:
> > Why do we have to do buffer lookups using the full buffer tag?
>
> We don't necessarily.
>
> > Why not just use (relNode, blockNum) and resolve hash collisions, if any?
>
> I tried that and unfortunately it came out as a negative - the number of
> collision gets higher and collisions in a hashtable will often imply a
> cache miss. You can even see the comparison function rising in the
> profile.
>
> I've actually changed course a bit and I'm trying something different: A
> two level structure. One hashtable that maps (RelFileNode, ForkNumber)
> to a 'open relation' data structure, and from there a radix tree over
> just the block number. To avoid having to look up in the hashtable
> frequently there's a pointer in RelationData to a fork's radix tree.
>
> This seems to have several advantages:
> * It doesn't require changes to the physical representation
>

Seems necessary

> * A radix tree allows ordered traversal, allowing for efficient
> prefetching et al.
>

Could be very important... some benchmarks of whether that was worthwhile
would really help

> * The ability to efficiently get all pages that belong to a relation
> makes dropping/truncating tables radically more efficient than now.
>

Not very important, since there are other approaches

> * A single lookup in the radix tree is on average significantly faster
> than in the hash table. I've measured ~350 vs 60 cycles in a
> nonconcurrent workload and ~820 vs 72~ cycles in a concurrent
> workload, using a recorded buffer access traces.
>

Good

> * Having a 'open relation' representation in shared memory also makes it
> much easier to implement more efficient relation extension and
> truncation - we can have pointers in there that signal how far the
> relation has been extended and how far it has been initialized.
>

Probably important, for extension

> The biggest disadvantage is that it's a good bit more complex than what
> we have right now. That we need dynamically many radix trees makes
> memory management nontrivial.
>

Which might mean we lose benefit by requiring many additional memory
accesses.

> Sounds halfway sane?
>

I think it is a good research direction, as long as we get the focus right.

Another similar approach is direct allocation of chunks of memory for
in-memory access. That would be much less flexible, but access would be in
<5 cycles. I mention this for comparison only.

So if the focus is on more efficient and reasonably flexible access to data
in memory, then it sounds good.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2015-09-13 13:17:28 Re: New gist vacuum.
Previous Message Stephen Frost 2015-09-13 12:49:34 Re: [DOCS] Missing COMMENT ON POLICY