Re: WIP: BRIN multi-range indexes

From: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: WIP: BRIN multi-range indexes
Date: 2021-03-09 20:51:45
Message-ID: CAFBsxsHrqP+OMp5cV6yiEzkwobtAp4rnxOGsfm_s_vX8qSNQeQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Mar 7, 2021 at 8:53 PM Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
wrote:
[v20210308b]

I managed to trap an assertion that somehow doesn't happen during the
regression tests. The callers of fill_expanded_ranges() do math like this:

/* both ranges and points are expanded into a separate element */
neranges = ranges->nranges + ranges->nvalues;

but inside fill_expanded_ranges() we have this assertion:

/* Check that the output array has the right size. */
Assert(neranges == (2 * ranges->nranges + ranges->nvalues));

In the regression test data, a debugging elog() shows that nranges is most
often zero, so in that case, the math happens to be right either way. I can
reliably get nranges above zero by running

update brintest_multi set int8col = int8col - 1;

a few times, at which point I get the crash.

Aside from that, the new changes look good. Just a couple small things:

+ allowed parameters. Only the <literal>bloom</literal> operator class
+ allows specifying parameters:

minmax-multi aren't mentioned here, but are mentioned further down.

+ * Addresses from different families are consider to be in maximum

(comment above brin_minmax_multi_distance_inet)
s/consider/considered/

> > 2) moving minmax/inclusion changes from 0002 to a separate patch 0003
> >
> > I think we should either ditch the 0003 (i.e. keep the existing
> > opclasses unchanged) or commit 0003 (in which case I'd vote to just stop
> > supporting the old signature of the consistent function).
> >
>
> Still not sure what do to about this. I'm leaning towards keeping 0003
> and just removing the "old" signature entirely, to keep the API cleaner.
> It might cause some breakage in out-of-core BRIN opclasses, but that
> seems like a reasonable price. Moreover, the opclasses may need some
> updating anyway, because of the changes in handling NULL scan keys (0004
> moves that from the opclass to the bringetbitmap function).

Keeping 0003 seems reasonable, given the above.

> > The remaining part that didn't get much review is the very last patch,
> > adding an option to ignore correlation for some BRIN opclases. This is
> > needed as the regular BRIN costing is quite sensitive to correlation,
> > and the cost gets way too high for poorly correlated data, making it
> > unlikely the index will be used. But handling such data sets efficiently
> > is the main point of those new opclasses. Any opinions on this?
> >
>
> Not sure about this.

I hadn't given it much thought (nor tested), but I just took a look at
brincostestimate(). If the table is badly correlated, I'm thinking the
number of ranges that need to be scanned will increase regardless. Maybe
rather than ignoring correlation, we could clamp it or otherwise tweak it.
Not sure about the details, though, that would require some testing.

--
John Naylor
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bossart, Nathan 2021-03-09 21:33:31 Re: partial heap only tuples
Previous Message Peter Geoghegan 2021-03-09 20:35:01 Re: New IndexAM API controlling index vacuum strategies