Re: DRAFT GIST support for ORDER BY

From: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To: Michał Kłeczek <michal(at)kleczek(dot)org>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: DRAFT GIST support for ORDER BY
Date: 2023-10-30 12:31:18
Message-ID: CAEze2WjYMX-MGgGien2Q3be3qHZVLHCj1WwsnumLiyEWZxpghQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 30 Oct 2023 at 09:04, Michał Kłeczek <michal(at)kleczek(dot)org> wrote:
>
> Hi All,
>
> Attached is a first attempt to implement GIST index (only) scans for ORDER BY column clauses.

Cool!

> The solution is not ideal as it requires registering “<“ and “>” operators as ordering operators in opfamily
> (which in turn makes it possible to issue somewhat meaningless “ORDER BY a < ‘constant’)

I don't quite understand why we need to register new "<" and ">"
operators. Can't we update the current ones?

> The problem is though that right now handling of ORDER BY column clauses is tightly coupled to BTree.
> It would be good to refactor the code so that semantics of ORDER BY column could be more flexible.

The existence of a BTREE operator class for the type is the indicator
that (and how) the type can be ordered - that is where PostgreSQL gets
its methods for ordering most types. Although I agree that it's a
quirk, I don't mind it that much as an indicator of how a type is
ordered.
I do agree, though, that operator classes by themselves should be able
to say "hey, we support full ordered retrieval as well". Right now,
that seems to be limited to btrees, but indeed a GiST index with
btree_gist columns should be able to support the same.

> It would be great if someone could take a look at it.

I've not looked in detail at the patch, but here's some comments:

> --- a/contrib/btree_gist/btree_gist--1.6--1.7.sql
> +++ b/contrib/btree_gist/btree_gist--1.6--1.7.sql

You seem to be modifying an existing migration of a released version
of the btree_bist extension. I suggest you instead add a migration
from 1.7 to a new version 1.8, and update the control file's default
installed version.

> ORDER BY a == ORDER BY a <-> MIN_VALUE
> and
> ORDER BY a DESC == ORDER BY a <-> MAX_VALUE
>
> This allows implementing GIST ordered scans for btree_gist datatypes.
>
> This in turn makes using GIST with partitioning feasible (I have described issues with such usage in my previous e-mails - see below).

Did you take into account that GiST's internal distance function uses
floating point, and is thus only an approximation for values that
require more than 2^54 significant bits in their distance function?
For example, GiST wouldn't be guaranteed to yield correct ordering of
int8/bigint when you use `my_column <-> UINT64_MAX` because as far as
the floating point math is concerned, 0 is about as far away from
INT64_MAX as (say) 20 and -21.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Aleksander Alekseev 2023-10-30 13:03:45 Re: "38.10.10. Shared Memory and LWLocks" may require a clarification
Previous Message Anton A. Melnikov 2023-10-30 12:28:53 Re: Some performance degradation in REL_16 vs REL_15