Re: A design for amcheck heapam verification

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: A design for amcheck heapam verification
Date: 2017-08-30 03:56:05
Message-ID: CAH2-Wzmsi_cPaF65AgD14z4KphTned+njSgptkWqoYiDL=qYxA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 29, 2017 at 7:22 PM, Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> Indeed. Thank you for working on this! To summarise a couple of
> ideas that Peter and I discussed off-list a while back: (1) While
> building the hash table for a hash join we could build a Bloom filter
> per future batch and keep it in memory, and then while reading from
> the outer plan we could skip writing tuples out to future batches if
> there is no chance they'll find a match when read back in later (works
> only for inner joins and only pays off in inverse proportion to the
> join's selectivity);

Right. Hash joins do tend to be very selective, though, so I think
that this could help rather a lot. With just 8 or 10 bits per element,
you can eliminate almost all the batch write-outs on the outer side.
No per-worker synchronization for BufFiles is needed when this
happens, either. It seems like that could be very important.

> To use this for anything involving parallelism where a Bloom filter
> must be shared we'd probably finish up having to create a shared
> version of bloom_init() that either uses caller-provided memory and
> avoids the internal pointer, or allocates DSA memory. I suppose you
> could consider splitting your bloom_init() function up into
> bloom_estimate() and bloom_init(user_supplied_space, ...) now, and
> getting rid of the separate pointer to bitset (ie stick char
> bitset[FLEXIBLE_ARRAY_MEMBER] at the end of the struct)?

Makes sense. Not hard to add that.

> I was just observing that there is an opportunity for code reuse here.
> This function's definition would ideally be something like:
>
> double
> bloom_prop_bits_set(bloom_filter *filter)
> {
> return popcount(filter->bitset, NBYTES(filter)) / (double) NBITS(filter);
> }
>
> This isn't an objection to the way you have it, since we don't have a
> popcount function yet! We have several routines in the tree for
> counting bits, though not yet the complete set from Hacker's Delight.

Right. I'm also reminded of the lookup tables for the visibility/freeze map.

> Your patch brings us one step closer to that goal. (The book says
> that this approach is good far sparse bitsets, but your comment says
> that we expect something near 50%. That's irrelevant anyway since a
> future centralised popcount() implementation would do this in
> word-sized chunks with a hardware instruction or branch-free-per-word
> lookups in a table and not care at all about sparseness.)

I own a copy of Hacker's Delight (well, uh, Daniel Farina lent me his
copy about 2 years ago!). pop()/popcount() does seem like a clever
algorithm, that we should probably think about adopting in some cases,
but I should point at that the current caller to my
bloom_prop_bits_set() function is an elog() DEBUG1 call. This is not
at all performance critical.

> + * Test if bloom filter definitely lacks element.
>
> I think where "Bloom filter" appears in prose it should have a capital
> letter (person's name).

Agreed.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2017-08-30 04:45:55 Re: segment size depending *_wal_size defaults (was increasing the default WAL segment size)
Previous Message Michael Paquier 2017-08-30 03:52:26 Re: segment size depending *_wal_size defaults (was increasing the default WAL segment size)