Re: WIP: BRIN multi-range indexes

From: John Naylor <john(dot)naylor(at)2ndquadrant(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: BRIN multi-range indexes
Date: 2020-09-17 21:42:59
Message-ID: CACPNZCuqpkCGt8=cywAk1kPu0OoV_TjPXeV-J639ABQWyViyug@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 17, 2020 at 12:34 PM Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>
> On Thu, Sep 17, 2020 at 10:33:06AM -0400, John Naylor wrote:
> >On Sun, Sep 13, 2020 at 12:40 PM Tomas Vondra
> ><tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> >> <20200913 patch set>
> But those are opclass parameters, so the parameters are not specified in
> WITH clause, but right after the opclass name:
>
> CREATE INDEX idx ON table USING brin (
> bigint_col int8_minmax_multi_ops(values_per_range = 15)
> );

D'oh!

> >+ * The index only supports equality operator, similarly to hash indexes.
> >
> >s/operator/operators/
> >
>
> Hmmm, are there really multiple equality operators?

Ah, I see what you meant -- then "_the_ equality operator" is what we want.

> The number of items the comment refers to is this:
>
> /* how many uint32 hashes can we fit into the bitmap */
> int maxvalues = filter->nbits / (8 * sizeof(uint32));
>
> where nbits is the size of the bloom filter. So I think the "same" is
> quite right here.

Ok, I get it now.

> >+ /*
> >+ * The 1% value is mostly arbitrary, it just looks nice.
> >+ */
> >+#define BLOOM_DEFAULT_FALSE_POSITIVE_RATE 0.01 /* 1% fp rate */
> >
> >I think we'd want better stated justification for this default, even
> >if just precedence in other implementations. Maybe I can test some
> >other values here?
> >
>
> Well, I don't know how to pick a better default :-( Ultimately it's a
> tarde-off between larger indexes and scanning larger fraction of a table
> due to lower false positive. Then there's the restriction that the whole
> index tuple needs to fit into a single 8kB page.

Well, it might be a perfectly good default, and it seems common in
articles on the topic, but the comment is referring to aesthetics. :-)
I still intend to test some cases.

> >Also, is there a reference for the algorithm for hash values that
> >follows? I didn't see anything like it in my cursory reading of the
> >topic. Might be good to include it in the comments.
> >
>
> This was suggested by Yura Sokolov [1] in 2017. The post refers to a
> paper [2] but I don't recall which part describes "our" algorithm.
>
> [1] https://www.postgresql.org/message-id/94c173b54a0aef6ae9b18157ef52f03e@postgrespro.ru
> [2] https://www.eecs.harvard.edu/~michaelm/postscripts/rsa2008.pdf

Hmm, I came across that paper while doing background reading. Okay,
now I get that "% (filter->nbits - 1)" is the second hash function in
that scheme. But now I wonder if that second function should actually
act on the passed "value" (the original hash), so that they are
actually independent, as required. In the language of that paper, the
patch seems to have

g(x) = h1(x) + i*h2(h1(x)) + f(i)

instead of

g(x) = h1(x) + i*h2(x) + f(i)

Concretely, I'm wondering if it should be:

big_h = DatumGetUint32(hash_uint32(value));
h = big_h % filter->nbits;
-d = big_h % (filter->nbits - 1);
+d = value % (filter->nbits - 1);

But I could be wrong.

Also, I take it that f(i) = 1 in the patch, hence the increment here:

+ h += d++;

But it's a little hidden. That might be worth commenting, if I haven't
completely missed something.

> >+ * Tweak the ndistinct value based on the pagesPerRange value. First,
> >
> >Nitpick: "Tweak" to my mind means to adjust an existing value. The
> >above is only true if ndistinct is negative, but we're really not
> >tweaking, but using it as a scale factor. Otherwise it's not adjusted,
> >only clamped.
> >
>
> OK. Perhaps 'adjust' would be a better term?

I felt like rewriting the whole thing, but your original gets the
point across ok, really.

"If the passed ndistinct value is positive, we can just use that, but
we also clamp the value to prevent over-sizing the bloom filter
unnecessarily. If it's negative, it represents a multiplier to apply
to the maximum number of tuples in the range (assuming each page gets
MaxHeapTuplesPerPage tuples, which is likely a significant
over-estimate), similar to the corresponding value in table
statistics."

> >+ /* TODO include the sorted/unsorted values */
> >
>
> This was simplemented as part of the discussion about pageinspect, and
> I wanted some confirmation if the approach is acceptable or not before
> spending more time on it. Also, the values are really just hashes of the
> column values, so I'm not quite sure it makes sense to include that.
> What would you suggest?

My gut feeling is the hashes are not useful for this purpose, but I
don't feel strongly either way.

--
John Naylor https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2020-09-17 21:54:32 Re: Global snapshots
Previous Message Ranier Vilela 2020-09-17 21:31:12 Re: Planner, check if can use consider HASH for groupings (src/backend/optimizer/plan/planner.c)