Re: WIP: BRIN multi-range indexes

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: WIP: BRIN multi-range indexes
Date: 2021-03-09 22:13:19
Message-ID: 35e866e9-d820-07de-b2be-69d5ac479179@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3/9/21 9:51 PM, John Naylor wrote:
>
> On Sun, Mar 7, 2021 at 8:53 PM Tomas Vondra
> <tomas(dot)vondra(at)enterprisedb(dot)com <mailto:tomas(dot)vondra(at)enterprisedb(dot)com>>
> wrote:
> [v20210308b]
>
> I managed to trap an assertion that somehow doesn't happen during the
> regression tests. The callers of fill_expanded_ranges() do math like this:
>
> /* both ranges and points are expanded into a separate element */
> neranges = ranges->nranges + ranges->nvalues;
>
> but inside fill_expanded_ranges() we have this assertion:
>
> /* Check that the output array has the right size. */
> Assert(neranges == (2 * ranges->nranges + ranges->nvalues));
>

Yeah, that assert is bogus. It's calculating the number of boundary
values (and ranges have two), but in ExpandedRanges each we still
represent that as a single element. So the Assert should be just

Assert(neranges == (ranges->nranges + ranges->nvalues));

But maybe it's just an overkill and we don't really need it.

> In the regression test data, a debugging elog() shows that nranges is
> most often zero, so in that case, the math happens to be right either
> way. I can reliably get nranges above zero by running
>
> update brintest_multi set int8col = int8col - 1;
>
> a few times, at which point I get the crash.
>

Hmm, so maybe we should do something like this in regression tests too?
It's not good that we don't trigger the "nranges > 0" case at all.

>
> Aside from that, the new changes look good. Just a couple small things:
>
> +    allowed parameters.  Only the <literal>bloom</literal> operator class
> +    allows specifying parameters:
>
> minmax-multi aren't mentioned here, but are mentioned further down.
>

I forgot to add this bit. Will fix.

> + * Addresses from different families are consider to be in maximum
>
> (comment above brin_minmax_multi_distance_inet)
> s/consider/considered/
>

Will fix.

>> > 2) moving minmax/inclusion changes from 0002 to a separate patch 0003
>> >
>> > I think we should either ditch the 0003 (i.e. keep the existing
>> > opclasses unchanged) or commit 0003 (in which case I'd vote to just stop
>> > supporting the old signature of the consistent function).
>> >
>>
>> Still not sure what do to about this. I'm leaning towards keeping 0003
>> and just removing the "old" signature entirely, to keep the API cleaner.
>> It might cause some breakage in out-of-core BRIN opclasses, but that
>> seems like a reasonable price. Moreover, the opclasses may need some
>> updating anyway, because of the changes in handling NULL scan keys (0004
>> moves that from the opclass to the bringetbitmap function).
>
> Keeping 0003 seems reasonable, given the above.
>

And do you agree with removing the old signature entirely? That might
break some out-of-core opclasses, but we're not aware of any, and
they'll be broken anyway. Seems fine to me.

>> > The remaining part that didn't get much review is the very last patch,
>> > adding an option to ignore correlation for some BRIN opclases. This is
>> > needed as the regular BRIN costing is quite sensitive to correlation,
>> > and the cost gets way too high for poorly correlated data, making it
>> > unlikely the index will be used. But handling such data sets efficiently
>> > is the main point of those new opclasses. Any opinions on this?
>> >
>>
>> Not sure about this.
>
> I hadn't given it much thought (nor tested), but I just took a look at
> brincostestimate(). If the table is badly correlated, I'm thinking the
> number of ranges that need to be scanned will increase regardless. Maybe
> rather than ignoring correlation, we could clamp it or otherwise tweak
> it. Not sure about the details, though, that would require some testing.
>

Well, maybe. In any case we need to do something about this, otherwise
the new opclasses won't be used even in cases where it's perfectly OK.
And it needs to be opclass-dependent, in some way.

I'm pretty sure even the simple examples you've used to test
minmax-multi (with updating a fraction of tuples to low/high value)
would be affected by this.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2021-03-09 22:18:10 Re: Optimising latch signals
Previous Message Tom Lane 2021-03-09 22:11:33 Re: Lowering the ever-growing heap->pd_lower