Re: WIP: BRIN multi-range indexes

From: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: WIP: BRIN multi-range indexes
Date: 2021-02-22 20:16:08
Message-ID: CAFBsxsHFL1_LRAP1SH4Y6Ms3-ofzn9Y3E+wAGT2kLZgUpa3wWg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 15, 2021 at 10:37 AM Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
wrote:
> [v20210215]

Hi Tomas,

This time I only looked at the cumulative changes in the multiminmax
opclass, since I'm pretty sure all the bloom issues have been addressed.

* XXX CombineRange name seems a bit weird. Consider renaming, perhaps to
* something ExpandedRange or so.

The time to do it is now, if you like, or remove the XXX. I agree with the
comment, FWIW.

In has_matching_range():

* So we know it's in the general min/max, the question is whether it
* falls in one of the ranges or gaps. We'll use a binary search on
* the ranges.
*
* it's in the general range, but is it actually covered by any
* of the ranges? Repeat the check for each range.
*
* XXX We simply walk the ranges sequentially, but maybe we could
* further leverage the ordering and non-overlap and use bsearch to
* speed this up a bit.

It looks to me like you already implemented binary search and the last part
is out of date, or am I missing something?

Same in range_contains_value():

* XXX This might benefit from the fact that both the intervals and exact
* values are sorted - we might do bsearch or something. Currently that
* does not make much difference (there are only ~32 intervals), but if
* this gets increased and/or the comparator function is more expensive,
* it might be a huge win.

Below that it does binary search if the number of elements > 16.

In merge_combine_ranges():

There are a couple assert-related TODOs.

In brin_minmax_multi_distance_timetz():

* XXX Does this need to consider the time zones?

I wouldn't think so, because the stored values are in UTC -- the time zone
calculation only happens during storage and retrieval, and they've already
been stored, IIUC.

Also, I think you need to copy this part from
brin_minmax_multi_distance_timestamp() here as well:

if (TIMESTAMP_NOT_FINITE(dt1) || TIMESTAMP_NOT_FINITE(dt2))
PG_RETURN_FLOAT8(0);

At this point, I think it's pretty close to commit-ready. I thought maybe I
would create a small index with every type, and make sure it looks sane in
page_inspect, but that's about it.

--
John Naylor
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2021-02-22 20:40:40 Re: libpq compression
Previous Message Álvaro Herrera 2021-02-22 20:15:57 Re: some pointless HeapTupleHeaderIndicatesMovedPartitions calls