Re: Considering additional sort specialisation functions for PG16

From: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: Considering additional sort specialisation functions for PG16
Date: 2023-01-26 10:29:25
Message-ID: CAFBsxsH=3UA+8Sp2sgUs=piYOmR2CQ-Rq5sUhCX-M3Gqv4TRog@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 23, 2022 at 1:13 PM John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
wrote:
>
> On Tue, Aug 23, 2022 at 11:24 AM David Rowley <dgrowleyml(at)gmail(dot)com>
wrote:
> >
> > On Tue, 23 Aug 2022 at 15:22, John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
wrote:
> > > Did you happen to see
> > >
> > >
https://www.postgresql.org/message-id/CAFBsxsFhq8VUSkUL5YO17cFXbCPwtbbxBu%2Bd9MFrrsssfDXm3Q%40mail.gmail.com
> >
> > I missed that. It looks like a much more promising idea than what I
> > came up with. I've not looked at your code yet, but I'm interested and
> > will aim to look soon.
>
> Note that I haven't actually implemented this idea yet, just tried to
> model the effects by lobotomizing the current comparators. I think
> it's worth pursuing and will try to come back to it this cycle, but if
> you or anyone else wants to try, that's fine of course.

Coming back to this, I wanted to sketch out this idea in a bit more detail.

Have two memtuple arrays, one for first sortkey null and one for first
sortkey non-null:
- Qsort the non-null array, including whatever specialization is available.
Existing specialized comparators could ignore nulls (and their ordering)
taking less space in the binary.
- Only if there is more than one sort key, qsort the null array. Ideally at
some point we would have a method of ignoring the first sortkey (this is an
existing opportunity that applies elsewhere as well).
- To handle two arrays, grow_memtuples() would need some adjustment, as
would any callers that read the final result of an in-memory sort -- they
would need to retrieve the tuples starting with the appropriate array
depending on NULLS FIRST/LAST behavior.

I believe external merges wouldn't have to do anything different, since
when writing out the tapes, we read from the arrays in the right order.

(One could extend this idea further and have two pools of tapes for null
and non-null first sortkey, that are merged separately, in the right order.
That sounds like quite a bit more complexity than is worth, however.)

--
John Naylor
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrei Zubkov 2023-01-26 11:02:43 Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements
Previous Message Dean Rasheed 2023-01-26 10:27:58 Re: to_hex() for negative inputs