From: | Martin Kalcher <martin(dot)kalcher(at)aboutsource(dot)net> |
---|---|
To: | Mladen Gogala <gogala(dot)mladen(at)gmail(dot)com>, pgsql-general(at)lists(dot)postgresql(dot)org |
Subject: | Re: Proposal to introduce a shuffle function to intarray extension |
Date: | 2022-07-16 20:21:54 |
Message-ID: | 7933c4a5-0e72-889e-1be9-e3574ffe333b@aboutsource.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general pgsql-hackers |
Am 16.07.22 um 18:53 schrieb Mladen Gogala:
> On 7/15/22 04:36, Martin Kalcher wrote:
>> Dear list,
>>
>> i am dealing with an application that processes fairly large arrays of
>> integers. It makes heavy use of the intarray extension, which works
>> great in most cases. However, there are two requirements that cannot
>> be addressed by the extension and are rather slow with plain SQL. Both
>> can be met with shuffling:
>>
>> - Taking n random members from an integer array
>> - Splitting an array into n chunks, where each member is assigned to a
>> random chunk
>>
>> Shuffling is currently implemented by unnesting the array, ordering
>> the members by random() and aggregating them again.
>
>
> Martin, have you considered PL/Python and NumPy module?
Hey Mladen,
thank you for your advice. Unfortunately the performance of shuffling
with NumPy is about the same as with SQL.
create function numpy_shuffle(arr int[])
returns int[]
as $$
import numpy
numpy.random.shuffle(arr)
return arr
$$ language 'plpython3u';
select arr[1:3]::text || ' ... ' || arr[3999998:4000000]::text
from (
select numpy_shuffle(arr) arr from numbers
) shuffled;
-------------------------------------------------------
{674026,3306457,1727170} ... {343875,3825484,1235246}
Time: 2315.431 ms (00:02.315)
Am i doing something wrong?
Martin
From | Date | Subject | |
---|---|---|---|
Next Message | Mladen Gogala | 2022-07-16 21:30:08 | Re: Proposal to introduce a shuffle function to intarray extension |
Previous Message | Lucie Šimečková | 2022-07-16 17:24:56 | Proposed Translations of Updated Code of Conduct Policy |
From | Date | Subject | |
---|---|---|---|
Next Message | Nathan Bossart | 2022-07-16 20:30:34 | Re: remove reset_shared() |
Previous Message | Andres Freund | 2022-07-16 20:17:50 | Re: Use -fvisibility=hidden for shared libraries |