From: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
---|---|
To: | Martin Kalcher <martin(dot)kalcher(at)aboutsource(dot)net> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Proposal to introduce a shuffle function to intarray extension |
Date: | 2022-07-17 22:37:04 |
Message-ID: | CA+hUKG+TPcsR-OmioTdtTHBs9k6dS0fOcgkw4YSdp_=RJhCxoQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general pgsql-hackers |
On Mon, Jul 18, 2022 at 4:15 AM Martin Kalcher
<martin(dot)kalcher(at)aboutsource(dot)net> wrote:
> Am 17.07.22 um 08:00 schrieb Thomas Munro:
> >> Actually ... is there a reason to bother with an intarray version
> >> at all, rather than going straight for an in-core anyarray function?
> >> It's not obvious to me that an int4-only version would have
> >> major performance advantages.
> >
> > Yeah, that seems like a good direction. If there is a performance
> > advantage to specialising, then perhaps we only have to specialise on
> > size, not type. Perhaps there could be a general function that
> > internally looks out for typbyval && typlen == 4, and dispatches to a
> > specialised 4-byte, and likewise for 8, if it can, and that'd already
> > be enough to cover int, bigint, float etc, without needing
> > specialisations for each type.
>
> I played around with the idea of an anyarray shuffle(). The hard part
> was to deal with arrays with variable length elements, as they can not
> be swapped easily in place. I solved it by creating an intermediate
> array of references to the elements. I'll attach a patch with the proof
> of concept. Unfortunatly it is already about 5 times slower than the
> specialised version and i am not sure if it is worth going down that road.
Seems OK for a worst case. It must still be a lot faster than doing
it in SQL. Now I wonder what the exact requirements would be to
dispatch to a faster version that would handle int4. I haven't
studied this in detail but perhaps to dispatch to a fast shuffle for
objects of size X, the requirement would be something like typlen == X
&& align_bytes <= typlen && typlen % align_bytes == 0, where
align_bytes is typalign converted to ALIGNOF_{CHAR,SHORT,INT,DOUBLE}?
Or in English, 'the data consists of densely packed objects of fixed
size X, no padding'. Or perhaps you can work out the padded size and
use that, to catch a few more types. Then you call
array_shuffle_{2,4,8}() as appropriate, which should be as fast as
your original int[] proposal, but work also for float, date, ...?
About your experimental patch, I haven't reviewed it properly or tried
it but I wonder if uint32 dat_offset, uint32 size (= half size
elements) would be enough due to limitations on varlenas.
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2022-07-17 22:46:27 | Re: Proposal to introduce a shuffle function to intarray extension |
Previous Message | Gogala, Mladen | 2022-07-17 18:58:12 | Re: Oracle to Postgress Migration |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2022-07-17 22:46:27 | Re: Proposal to introduce a shuffle function to intarray extension |
Previous Message | Tom Lane | 2022-07-17 22:25:19 | Re: postgres_fdw versus regconfig and similar constants |