Re: Proposal to introduce a shuffle function to intarray extension

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Martin Kalcher <martin(dot)kalcher(at)aboutsource(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Proposal to introduce a shuffle function to intarray extension
Date: 2022-07-17 06:00:09
Message-ID: CA+hUKGLcjrJ9Q2OZAKHN-SQyq_m5QDv8d3WKHYq5HW8Lyozu4w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On Sun, Jul 17, 2022 at 3:37 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I wrote:
> > On the whole, I'd vote for calling it shuffle(), and expecting that
> > we'd also use that name for any future generic version.
>
> Actually ... is there a reason to bother with an intarray version
> at all, rather than going straight for an in-core anyarray function?
> It's not obvious to me that an int4-only version would have
> major performance advantages.

Yeah, that seems like a good direction. If there is a performance
advantage to specialising, then perhaps we only have to specialise on
size, not type. Perhaps there could be a general function that
internally looks out for typbyval && typlen == 4, and dispatches to a
specialised 4-byte, and likewise for 8, if it can, and that'd already
be enough to cover int, bigint, float etc, without needing
specialisations for each type.

I went to see what Professor Lemire would have to say about all this,
expecting to find a SIMD rabbit hole to fall down for some Sunday
evening reading, but the main thing that jumped out was this article
about the modulo operation required by textbook Fisher-Yates to get a
bounded random number, the random() % n that appears in the patch. He
talks about shuffling twice as fast by using a no-division trick to
get bounded random numbers[1]. I guess you might need to use our
pg_prng_uint32() for that trick because random()'s 0..RAND_MAX might
introduce bias. Anyway, file that under go-faster ideas for later.

[1] https://lemire.me/blog/2016/06/30/fast-random-shuffling/

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Martin Kalcher 2022-07-17 09:16:28 Re: Proposal to introduce a shuffle function to intarray extension
Previous Message Tom Lane 2022-07-17 03:37:26 Re: Proposal to introduce a shuffle function to intarray extension

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2022-07-17 06:19:47 Re: Allowing REINDEX to have an optional name
Previous Message Nathan Bossart 2022-07-17 03:59:57 Re: optimize lookups in snapshot [sub]xip arrays