Re: WIP: BRIN multi-range indexes

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: WIP: BRIN multi-range indexes
Date: 2021-03-02 23:32:04
Message-ID: b50e5c5d-4372-8aee-308c-09845c2c5f86@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

attached is a cleaned up version of the patch series, addressing the
issues from the last review. I've also reviewed the comments, merged the
patches to keep only the "hybrid" approach of building minmax-multi, and
a couple minor fixes. I've also passed it through pgindent.

There are a couple minor FIXMEs, I'll deal with those before commit.

I have a couple questions, though:

1) The 0001 patch allows passing of all scan keys to BRIN opclasses,
which is needed for the minmax-multi to work. But it also modifies the
existing opclasses (minmax and inclusion) to do this - but for those
opclasses it does not make much difference, I think. Initially this was
done because the patch did this for all opclasses, but then we added the
detection based on number of parameters. So I wonder if we should just
remove this, to make the patch a bit smaller. It'll also test the other
code path (for opclasses without the last parameter).

2) This needs bsearch_arg, but we only have that defined/used in
extended_statistics_internal.h - it seems silly to include that here, as
this has nothing to do with extended stats, so I simply added another
prototype (which gets linked correctly). But, I suppose a better way
would be to define our "portable" variant pg_bsearch_arg, next to
pg_qsort etc.

On 2/22/21 9:16 PM, John Naylor wrote:
>
> On Mon, Feb 15, 2021 at 10:37 AM Tomas Vondra
> <tomas(dot)vondra(at)enterprisedb(dot)com <mailto:tomas(dot)vondra(at)enterprisedb(dot)com>>
> wrote:
>> [v20210215]
>
> Hi Tomas,
>
> This time I only looked at the cumulative changes in the multiminmax
> opclass, since I'm pretty sure all the bloom issues have been addressed.
>
>  * XXX CombineRange name seems a bit weird. Consider renaming, perhaps to
>  * something ExpandedRange or so.
>
> The time to do it is now, if you like, or remove the XXX. I agree with
> the comment, FWIW.
>

OK. I've renamed the struct to ExpandedRanges, I think this is better.
I've also updated names of the various function names containing
_combine_ (usually to _expanded_).

> In has_matching_range():
>
>  * So we know it's in the general min/max, the question is whether it
>  * falls in one of the ranges or gaps. We'll use a binary search on
>  * the ranges.
>  *
>  * it's in the general range, but is it actually covered by any
>  * of the ranges? Repeat the check for each range.
>  *
>  * XXX We simply walk the ranges sequentially, but maybe we could
>  * further leverage the ordering and non-overlap and use bsearch to
>  * speed this up a bit.
>
> It looks to me like you already implemented binary search and the last
> part is out of date, or am I missing something?
>
> Same in range_contains_value():
>
>  * XXX This might benefit from the fact that both the intervals and exact
>  * values are sorted - we might do bsearch or something. Currently that
>  * does not make much difference (there are only ~32 intervals), but if
>  * this gets increased and/or the comparator function is more expensive,
>  * it might be a huge win.
>
> Below that it does binary search if the number of elements > 16.
>

Good point. I've implemented the binary search a while ago, but I forgot
to update the comments. Fixed.

I'm not sure whether to keep the threshold with 16 elements, or just do
binary search in all cases. A simple linear search tends to be faster
for small arrays, but it depends on how expensive the comparison
function is for the particular data type. Maybe we should keep this
simple for now.

> In merge_combine_ranges():
>
> There are a couple assert-related TODOs.
>

Removed. I think the Assert after the call is sufficient.

> In brin_minmax_multi_distance_timetz():
>
>  * XXX Does this need to consider the time zones?
>
> I wouldn't think so, because the stored values are in UTC -- the time
> zone calculation only happens during storage and retrieval, and they've
> already been stored, IIUC.
>
> Also, I think you need to copy this part from
> brin_minmax_multi_distance_timestamp() here as well:
>
> if (TIMESTAMP_NOT_FINITE(dt1) || TIMESTAMP_NOT_FINITE(dt2))
> PG_RETURN_FLOAT8(0);
>

Ummm, in brin_minmax_multi_distance_timetz()? I don't think timetz can
be infinite, no? I think brin_minmax_multi_distance_date you meant?

>
> At this point, I think it's pretty close to commit-ready. I thought
> maybe I would create a small index with every type, and make sure it
> looks sane in page_inspect, but that's about it. 
>

OK, thanks for the reviews!

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
0001-Pass-all-scan-keys-to-BRIN-consistent-funct-20210303.patch text/x-patch 23.7 KB
0002-Move-IS-NOT-NULL-handling-from-BRIN-support-20210303.patch text/x-patch 23.1 KB
0003-Optimize-allocations-in-bringetbitmap-20210303.patch text/x-patch 4.5 KB
0004-BRIN-bloom-indexes-20210303.patch text/x-patch 127.5 KB
0005-BRIN-minmax-multi-indexes-20210303.patch text/x-patch 239.2 KB
0006-Ignore-correlation-for-new-BRIN-opclasses-20210303.patch text/x-patch 4.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message rbrooks 2021-03-02 23:41:27 Re: enable_incremental_sort changes query behavior
Previous Message Andrew Dunstan 2021-03-02 22:04:16 Re: buildfarm windows checks / tap tests on windows