Re: PATCH: Using BRIN indexes for sorted output

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: PATCH: Using BRIN indexes for sorted output
Date: 2023-07-10 15:09:44
Message-ID: 1992ff4f-bf0e-3ad0-a859-cf054f6abc74@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 7/10/23 14:38, Matthias van de Meent wrote:
> On Mon, 10 Jul 2023 at 13:43, Tomas Vondra
> <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>> On 7/10/23 12:22, Matthias van de Meent wrote:
>>> On Fri, 7 Jul 2023 at 20:26, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>>>> However, it's not quite clear to me is what you mean by the weight- and
>>>> compare-operators? That is, what are
>>>>
>>>> - brin_minmax_minorder(PG_FUNCTION_ARGS=brintuple) -> range.min
>>>> - brin_minmax_maxorder(PG_FUNCTION_ARGS=brintuple) -> range.max
>>>> - brin_minmax_compare(order, order) -> int
>>>>
>>>> supposed to do? Or what does "PG_FUNCTION_ARGS=brintuple" mean?
>>>
>>> _minorder/_maxorder is for extracting the minimum/maximum relative
>>> order of each range, used for ASC/DESC sorting of operator results
>>> (e.g. to support ORDER BY <->(box_column, '(1,2)'::point) DESC).
>>> PG_FUNCTION_ARGS is mentioned because of PG calling conventions;
>>> though I did forget to describe the second operator argument for the
>>> distance function. We might also want to use only one such "order
>>> extraction function" with DESC/ASC indicated by an argument.
>>>
>>
>> I'm still not sure I understand what "minimum/maximum relative order"
>> is. Isn't it the same as returning min/max value that can appear in the
>> range? Although, that wouldn't work for points/boxes.
>
> Kind of. For single-dimensional opclasses (minmax, minmax_multi) we
> only need to extract the normal min/max values for ASC/DESC sorts,
> which are readily available in the summary. But for multi-dimensional
> and distance searches (nearest neighbour) we need to calculate the
> distance between the indexed value(s) and the origin value to compare
> the summary against, and the order would thus be asc/desc on distance
> - a distance which may not be precisely represented by float types -
> thus 'relative order' with its own order operation.
>

Can you give some examples of such data / queries, and how would it
leverage the BRIN sort stuff?

For distance searches, I imagine this as data indexed by BRIN inclusion
opclass, which creates a bounding box. We could return closest/furthest
point on the bounding box (from the point used in the query). Which
seems a bit like a R-tree ...

But I have no idea what would this do for multi-dimensional searches, or
what would those searches do? How would you sort such data other than
lexicographically? Which I think is covered by the current BRIN Sort,
because the data is either stored as multiple columns, in which case we
use the BRIN on the first column. Or it's indexed using BRIN minmax as a
tuple of values, but then it's sorted lexicographically.

>>>> In principle we just need a procedure that tells us min/max for a given
>>>> page range - I guess that's what the minorder/maxorder functions do? But
>>>> why would we need the compare one? We're comparing by the known data
>>>> type, so we can just delegate the comparison to that, no?
>>>
>>> Is there a comparison function for any custom orderable type that we
>>> can just use? GIST distance ordering uses floats, and I don't quite
>>> like that from a user perspective, as it makes ordering operations
>>> imprecise. I'd rather allow (but discourage) any type with its own
>>> compare function.
>>>
>>
>> I haven't really thought about geometric types, just about minmax and
>> minmax-multi. It's not clear to me what the benefit for these types be.
>> I mean, we can probably sort points lexicographically, but is anyone
>> doing that in queries? It seems useless for order by distance.
>
> Yes, that's why you would sort them by distance, where the distance is
> generated by the opclass as min/max distance between the summary and
> the distance's origin, and then inserted into the tuplesort.
>

OK, so the query says "order by distance from point X" and we calculate
the min/max distance of values in a given page range.

> (previously)
>>>> I finally had time to look at this patch again. There's a bit of bitrot,
>>>> so here's a rebased version (no other changes).
>
> It seems like you forgot to attach the rebased patch, so unless you're
> actively working on updating the patchset right now, could you send
> the rebase to make CFBot happy?
>

Yeah, I forgot about the attachment. So here it is.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
0001-Allow-index-AMs-to-build-and-use-custom-sta-20230710.patch text/x-patch 63.4 KB
0002-wip-introduce-debug_brin_stats-20230710.patch text/x-patch 5.7 KB
0003-wip-introduce-debug_brin_cross_check-20230710.patch text/x-patch 19.7 KB
0004-Allow-BRIN-indexes-to-produce-sorted-output-20230710.patch text/x-patch 132.2 KB
0005-wip-brinsort-explain-stats-20230710.patch text/x-patch 12.8 KB
0006-wip-multiple-watermark-steps-20230710.patch text/x-patch 6.3 KB
0007-wip-adjust-watermark-step-20230710.patch text/x-patch 9.1 KB
0008-wip-adaptive-watermark-step-20230710.patch text/x-patch 13.1 KB
0009-wip-add-brinsort-regression-tests-20230710.patch text/x-patch 179.7 KB
0010-wip-add-brinsort-amstats-regression-tests-20230710.patch text/x-patch 179.7 KB
0011-wip-test-generator-script-20230710.patch text/x-patch 11.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2023-07-10 15:28:25 Re: Performance degradation on concurrent COPY into a single relation in PG16.
Previous Message Paul A Jungwirth 2023-07-10 15:06:31 Re: Exclusion constraints on partitioned tables