From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Martin Kalcher <martin(dot)kalcher(at)aboutsource(dot)net> |
Cc: | Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>, Andres Freund <andres(at)anarazel(dot)de>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, John Naylor <john(dot)naylor(at)enterprisedb(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: [PATCH] Introduce array_shuffle() and array_sample() |
Date: | 2022-09-29 19:33:26 |
Message-ID: | 344667.1664480006@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general pgsql-hackers |
Martin Kalcher <martin(dot)kalcher(at)aboutsource(dot)net> writes:
> New patch: array_shuffle() and array_sample() use pg_global_prng_state now.
I took a closer look at the patch today. I find this behavior a bit
surprising:
+SELECT array_dims(array_sample('[-1:2][2:3]={{1,2},{3,NULL},{5,6},{7,8}}'::int[], 3));
+ array_dims
+-------------
+ [-1:1][2:3]
+(1 row)
I can buy preserving the lower bound in array_shuffle(), but
array_sample() is not preserving the first-dimension indexes of
the array, so ISTM it ought to reset the first lower bound to 1.
Some other comments:
+ Returns <parameter>n</parameter> randomly chosen elements from <parameter>array</parameter> in selection order.
What's "selection order"? And this probably shouldn't just rely
on the example to describe what happens with multi-D arrays.
Writing "elements" seems somewhere between confusing and wrong.
* Personally I think I'd pass the TypeCacheEntry pointer to array_shuffle_n,
and let it pull out what it needs. Less duplicate code that way.
* I find array_shuffle_n drastically undercommented, and what comments
it has are pretty misleading, eg
+ /* Swap all elements in item (i) with elements in item (j). */
j is *not* the index of the second item to be swapped. You could make
it so, and that might be more readable:
j = (int) pg_prng_uint64_range(&pg_global_prng_state, i, nitem - 1);
jelms = elms + j * nelm;
jnuls = nuls + j * nelm;
But if you want the code to stay as it is, this comment needs work.
* I think good style for SQL-callable C functions is to make the arguments
clear at the top:
+array_sample(PG_FUNCTION_ARGS)
+{
+ ArrayType *array = PG_GETARG_ARRAYTYPE_P(0);
+ int n = PG_GETARG_INT32(1);
+ ArrayType *result;
+ ... other declarations as needed ...
We can't quite make normal C declaration style work, but that's a poor
excuse for not making the argument list visible as directly as possible.
* I wouldn't bother with the PG_FREE_IF_COPY calls. Those are generally
only used in btree comparison functions, in which there's a need to not
leak memory.
* I wonder if we really need these to be proparallel = 'r'. If we let
a worker process execute them, they will take numbers from the worker's
pg_global_prng_state seeding not the parent process's seeding, but why
is that a problem? In the prior version where you were tying them
to the parent's drandom() sequence, proparallel = 'r' made sense;
but now I think it's unnecessary.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2022-09-30 00:18:32 | Re: Streaming wal from primiry terminating |
Previous Message | Lahnov, Igor | 2022-09-29 13:04:08 | RE: Streaming wal from primiry terminating |
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2022-09-29 21:58:59 | Re: problems with making relfilenodes 56-bits |
Previous Message | Tom Lane | 2022-09-29 18:39:44 | Re: making relfilenodes 56 bits |