Re: Proposal to introduce a shuffle function to intarray extension

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Martin Kalcher <martin(dot)kalcher(at)aboutsource(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Proposal to introduce a shuffle function to intarray extension
Date: 2022-07-17 22:37:04
Message-ID: CA+hUKG+TPcsR-OmioTdtTHBs9k6dS0fOcgkw4YSdp_=RJhCxoQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On Mon, Jul 18, 2022 at 4:15 AM Martin Kalcher
<martin(dot)kalcher(at)aboutsource(dot)net> wrote:
> Am 17.07.22 um 08:00 schrieb Thomas Munro:
> >> Actually ... is there a reason to bother with an intarray version
> >> at all, rather than going straight for an in-core anyarray function?
> >> It's not obvious to me that an int4-only version would have
> >> major performance advantages.
> >
> > Yeah, that seems like a good direction. If there is a performance
> > advantage to specialising, then perhaps we only have to specialise on
> > size, not type. Perhaps there could be a general function that
> > internally looks out for typbyval && typlen == 4, and dispatches to a
> > specialised 4-byte, and likewise for 8, if it can, and that'd already
> > be enough to cover int, bigint, float etc, without needing
> > specialisations for each type.
>
> I played around with the idea of an anyarray shuffle(). The hard part
> was to deal with arrays with variable length elements, as they can not
> be swapped easily in place. I solved it by creating an intermediate
> array of references to the elements. I'll attach a patch with the proof
> of concept. Unfortunatly it is already about 5 times slower than the
> specialised version and i am not sure if it is worth going down that road.

Seems OK for a worst case. It must still be a lot faster than doing
it in SQL. Now I wonder what the exact requirements would be to
dispatch to a faster version that would handle int4. I haven't
studied this in detail but perhaps to dispatch to a fast shuffle for
objects of size X, the requirement would be something like typlen == X
&& align_bytes <= typlen && typlen % align_bytes == 0, where
align_bytes is typalign converted to ALIGNOF_{CHAR,SHORT,INT,DOUBLE}?
Or in English, 'the data consists of densely packed objects of fixed
size X, no padding'. Or perhaps you can work out the padded size and
use that, to catch a few more types. Then you call
array_shuffle_{2,4,8}() as appropriate, which should be as fast as
your original int[] proposal, but work also for float, date, ...?

About your experimental patch, I haven't reviewed it properly or tried
it but I wonder if uint32 dat_offset, uint32 size (= half size
elements) would be enough due to limitations on varlenas.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2022-07-17 22:46:27 Re: Proposal to introduce a shuffle function to intarray extension
Previous Message Gogala, Mladen 2022-07-17 18:58:12 Re: Oracle to Postgress Migration

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-07-17 22:46:27 Re: Proposal to introduce a shuffle function to intarray extension
Previous Message Tom Lane 2022-07-17 22:25:19 Re: postgres_fdw versus regconfig and similar constants