Re: Proposal to introduce a shuffle function to intarray extension

From: Martin Kalcher <martin(dot)kalcher(at)aboutsource(dot)net>
To: Mladen Gogala <gogala(dot)mladen(at)gmail(dot)com>, pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Proposal to introduce a shuffle function to intarray extension
Date: 2022-07-16 20:21:54
Message-ID: 7933c4a5-0e72-889e-1be9-e3574ffe333b@aboutsource.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Am 16.07.22 um 18:53 schrieb Mladen Gogala:
> On 7/15/22 04:36, Martin Kalcher wrote:
>> Dear list,
>>
>> i am dealing with an application that processes fairly large arrays of
>> integers. It makes heavy use of the intarray extension, which works
>> great in most cases. However, there are two requirements that cannot
>> be addressed by the extension and are rather slow with plain SQL. Both
>> can be met with shuffling:
>>
>> - Taking n random members from an integer array
>> - Splitting an array into n chunks, where each member is assigned to a
>> random chunk
>>
>> Shuffling is currently implemented by unnesting the array, ordering
>> the members by random() and aggregating them again.
>
>
> Martin, have you considered PL/Python and NumPy module?

Hey Mladen,

thank you for your advice. Unfortunately the performance of shuffling
with NumPy is about the same as with SQL.

create function numpy_shuffle(arr int[])
returns int[]
as $$
import numpy
numpy.random.shuffle(arr)
return arr
$$ language 'plpython3u';

select arr[1:3]::text || ' ... ' || arr[3999998:4000000]::text
from (
select numpy_shuffle(arr) arr from numbers
) shuffled;

-------------------------------------------------------
{674026,3306457,1727170} ... {343875,3825484,1235246}

Time: 2315.431 ms (00:02.315)

Am i doing something wrong?

Martin

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Mladen Gogala 2022-07-16 21:30:08 Re: Proposal to introduce a shuffle function to intarray extension
Previous Message Lucie Šimečková 2022-07-16 17:24:56 Proposed Translations of Updated Code of Conduct Policy

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2022-07-16 20:30:34 Re: remove reset_shared()
Previous Message Andres Freund 2022-07-16 20:17:50 Re: Use -fvisibility=hidden for shared libraries