Re: [PATCH] Introduce array_shuffle() and array_sample()

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Martin Kalcher <martin(dot)kalcher(at)aboutsource(dot)net>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, John Naylor <john(dot)naylor(at)enterprisedb(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Introduce array_shuffle() and array_sample()
Date: 2022-07-24 08:15:22
Message-ID: alpine.DEB.2.22.394.2207240929220.1359119@pseudo
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers


>> i came to the same conclusions and went with Option 1 (see patch). Mainly
>> because most code in utils/adt is organized by type and this way it is
>> clear, that this is a thin wrapper around pg_prng.
>>
>
> Small patch update. I realized the new functions should live
> array_userfuncs.c (rather than arrayfuncs.c), fixed some file headers and
> added some comments to the code.

My 0,02€ about this patch:

Use (void) when declaring no parameters in headers or in functions.

Should the exchange be skipped when i == k?

I do not see the point of having *only* inline functions in a c file. The
external functions should not be inlined?

The random and array shuffling functions share a common state. I'm
wondering whether it should ot should not be so. It seems ok, but then
ISTM that the documentation suggests implicitely that setseed applies to
random() only, which is not the case anymore after the patch.

If more samples are required than the number of elements, it does not
error out. I'm wondering whether it should.

Also, the sampling should not return its result in order when the number
of elements required is the full array, ISTM that it should be shuffled
there as well.

I must say that without significant knowledge of the array internal
implementation, the swap code looks pretty strange. ISTM that the code
would be clearer if pointers and array references style were not
intermixed.

Maybe you could add a test with a 3D array? Some sample with NULLs?

Unrelated: I notice again that (postgre)SQL does not provide a way to
generate random integers. I do not see why not. Should we provide one?

--
Fabien.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Pavel Stehule 2022-07-24 09:19:10 Re: Queries in another user's tables
Previous Message xavier 2022-07-24 07:37:54 Queries in another user's tables

Browse pgsql-hackers by date

  From Date Subject
Next Message Dmitry Dolgov 2022-07-24 10:06:36 Re: pg_stat_statements and "IN" conditions
Previous Message Zhang Mingli 2022-07-24 04:48:25 Re: optimize lookups in snapshot [sub]xip arrays