Re: Should we optimize the `ORDER BY random() LIMIT x` case?

From: Aleksander Alekseev <aleksander(at)timescale(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: Andrei Lepikhov <lepihov(at)gmail(dot)com>, Nico Williams <nico(at)cryptonector(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, wenhui qiu <qiuwenhuifx(at)gmail(dot)com>, Vik Fearing <vik(at)postgresfriends(dot)org>
Subject: Re: Should we optimize the `ORDER BY random() LIMIT x` case?
Date: 2025-05-20 10:46:38
Message-ID: CAJ7c6TOgEX02tofvoMaZHAO4E5pe=pZs2VNHa9m5aEpM9mwviw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andrei,

> > ```
> > -- imagine replacing inefficient array_sample(array_agg(t), 10)
> > -- with more efficient array_sample_reservoir(t, 10)
> > SELECT (unnest(agg)).* AS k FROM
> > ( SELECT array_sample(array_agg(t), 10) AS agg FROM (
> > ... here goes the subquery ...
> > ) AS t
> > );
> > ```
> >
> > ... if only we supported such a column expansion for not registered
> > records. Currently such a query fails with:
> >
> > ```
> > ERROR: record type has not been registered
> > ```
> I know about this issue. Having resolved it in a limited number of local
> cases (like FDW push-down of row types), I still do not have a universal
> solution worth proposing upstream. Do you have any public implementation
> of the array_sample_reservoir to play with?

array_sample_reservoir() is purely a figment of my imagination at the
moment. Semantically it does the same as array_sample(array_agg(t), N)
except the fact that array_sample(..., N) requires the array to have
at least N items. You can experiment with array_sample(array_agg(...),
N) as long as the subquery returns much more than N rows.

--
Best regards,
Aleksander Alekseev

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2025-05-20 11:28:17 Re: Add comment explaining why queryid is int64 in pg_stat_statements
Previous Message shveta malik 2025-05-20 10:30:07 Re: Conflict detection for update_deleted in logical replication