Re: [PATCH] Introduce array_shuffle() and array_sample()

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Martin Kalcher <martin(dot)kalcher(at)aboutsource(dot)net>, John Naylor <john(dot)naylor(at)enterprisedb(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Introduce array_shuffle() and array_sample()
Date: 2022-07-18 20:34:50
Message-ID: 809171.1658176490@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Mon, Jul 18, 2022 at 3:03 PM Martin Kalcher
> <martin(dot)kalcher(at)aboutsource(dot)net> wrote:
>> array_shuffle(anyarray) -> anyarray
>> array_sample(anyarray, integer) -> anyarray

> I think it's questionable whether the behavior of array_shuffle() is
> correct for a multi-dimensional array. The implemented behavior is to
> keep the dimensions as they were, but permute the elements across all
> levels at random. But there are at least two other behaviors that seem
> potentially defensible: (1) always return a 1-dimensional array, (2)
> shuffle the sub-arrays at the top-level without the possibility of
> moving elements within or between sub-arrays. What behavior we decide
> is best here should be documented.

Martin had originally proposed (2), which I rejected on the grounds
that we don't treat multi-dimensional arrays as arrays-of-arrays for
any other purpose. Maybe we should have, but that ship sailed decades
ago, and I doubt that shuffle() is the place to start changing it.

I could get behind your option (1) though, to make it clearer that
the input array's dimensionality is not of interest. Especially since,
as you say, (1) is pretty much the only sensible choice for array_sample.

> I also think you should add test cases involving multi-dimensional
> arrays, as well as arrays with non-default bounds. e.g. trying
> shuffling or sampling some values like
> '[8:10][-6:-5]={{1,2},{3,4},{5,6}}'::int[]

This'd only matter if we decide not to ignore the input's dimensionality.

regards, tom lane

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Alan Hodgson 2022-07-18 20:39:17 Re: pg_receivewal/xlog to ship wal to cloud
Previous Message Robert Haas 2022-07-18 20:27:30 Re: [PATCH] Introduce array_shuffle() and array_sample()

Browse pgsql-hackers by date

  From Date Subject
Next Message Jacob Champion 2022-07-18 20:34:52 Re: [Commitfest 2022-07] Begins Now
Previous Message Robert Haas 2022-07-18 20:28:35 Re: pg15b2: large objects lost on upgrade