Re: PATCH: Using BRIN indexes for sorted output

From: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: PATCH: Using BRIN indexes for sorted output
Date: 2023-07-10 16:18:12
Message-ID: CAEze2WjOrBXbMEM-xq8OOPJquNdrTn-t6qX1G7FzBfn9nY_a6w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 10 Jul 2023 at 17:09, Tomas Vondra
<tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
> On 7/10/23 14:38, Matthias van de Meent wrote:
>> Kind of. For single-dimensional opclasses (minmax, minmax_multi) we
>> only need to extract the normal min/max values for ASC/DESC sorts,
>> which are readily available in the summary. But for multi-dimensional
>> and distance searches (nearest neighbour) we need to calculate the
>> distance between the indexed value(s) and the origin value to compare
>> the summary against, and the order would thus be asc/desc on distance
>> - a distance which may not be precisely represented by float types -
>> thus 'relative order' with its own order operation.
>>
>
> Can you give some examples of such data / queries, and how would it
> leverage the BRIN sort stuff?

Order by distance would be `ORDER BY box <-> '(1, 2)'::point ASC`, and
the opclass would then decide that `<->(box, point) ASC` means it has
to return the closest distance from the point to the summary, for some
measure of 'distance' (this case L2, <#> other types, etc.). For DESC,
that would return the distance from `'(1,2)'::point` to the furthest
edge of the summary away from that point. Etc.

> For distance searches, I imagine this as data indexed by BRIN inclusion
> opclass, which creates a bounding box. We could return closest/furthest
> point on the bounding box (from the point used in the query). Which
> seems a bit like a R-tree ...

Kind of; it would allow us to utilize such orderings without the
expensive 1 tuple = 1 index entry and without scanning the full table
before getting results. No tree involved, just a sequential scan on
the index to allow some sketch-based pre-sort on the data. Again, this
would work similar to how GiST's internal pages work: each downlink in
GiST contains a summary of the entries on the downlinked page, and
distance searches use a priority queue where the priority is the
distance of the opclass-provided distance operator - lower distance
means higher priority. For BRIN, we'd have to build a priority queue
for the whole table at once, but presorting table sections is part of
the design of BRIN sort, right?

> But I have no idea what would this do for multi-dimensional searches, or
> what would those searches do? How would you sort such data other than
> lexicographically? Which I think is covered by the current BRIN Sort,
> because the data is either stored as multiple columns, in which case we
> use the BRIN on the first column. Or it's indexed using BRIN minmax as a
> tuple of values, but then it's sorted lexicographically.

Yes, just any BRIN summary that allows distance operators and the like
should be enough MINMAX is easy to understand, and box inclusion are
IMO also fairly easy to understand.

>>> I haven't really thought about geometric types, just about minmax and
>>> minmax-multi. It's not clear to me what the benefit for these types be.
>>> I mean, we can probably sort points lexicographically, but is anyone
>>> doing that in queries? It seems useless for order by distance.
>>
>> Yes, that's why you would sort them by distance, where the distance is
>> generated by the opclass as min/max distance between the summary and
>> the distance's origin, and then inserted into the tuplesort.
>>
>
> OK, so the query says "order by distance from point X" and we calculate
> the min/max distance of values in a given page range.

Yes, and because it's BRIN that's an approximation, which should
generally be fine.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech/)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2023-07-10 16:24:15 Re: Performance degradation on concurrent COPY into a single relation in PG16.
Previous Message Tom Lane 2023-07-10 16:17:19 Re: gcc -Wclobbered in PostgresMain