Re: Should we optimize the `ORDER BY random() LIMIT x` case?

From: wenhui qiu <qiuwenhuifx(at)gmail(dot)com>
To: Aleksander Alekseev <aleksander(at)timescale(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Should we optimize the `ORDER BY random() LIMIT x` case?
Date: 2025-05-15 02:06:38
Message-ID: CAGjGUAJf9tjVqp3TjazHAuxnne8jxkqqVXFKjtt2X=i=V-MW+Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Aleksander
if we can optimize the query, that would be great,Then we won't need to
pull a lot of data to the program end and randomly pick the needed data
there.

On Thu, 15 May 2025 at 07:41, Aleksander Alekseev <aleksander(at)timescale(dot)com>
wrote:

> Hi,
>
> If I didn't miss anything, currently we don't seem to support sampling
> the result of an arbitrary SELECT query efficiently.
>
> To give one specific example:
>
> ````
> CREATE TABLE temperature(
> ts TIMESTAMP NOT NULL,
> city TEXT NOT NULL,
> temperature INT NOT NULL);
>
> CREATE TABLE humidity(
> ts TIMESTAMP NOT NULL,
> city TEXT NOT NULL,
> humidity INT NOT NULL);
>
> -- imagine having much more data ...
> INSERT INTO temperature (ts, city, temperature)
> SELECT ts + (INTERVAL '60 minutes' * random()), city, 30*random()
> FROM generate_series('2022-01-01' :: TIMESTAMP,
> '2022-01-31', '1 day') AS ts,
> unnest(array['City A', 'City B']) AS city;
>
> INSERT INTO humidity (ts, city, humidity)
> SELECT ts + (INTERVAL '60 minutes' * random()), city, 100*random()
> FROM generate_series('2022-01-01' :: TIMESTAMP,
> '2022-01-31', '1 day') AS ts,
> unnest(array['City A', 'City B']) AS city;
>
> -- "AS OF" join:
> SELECT t.ts, t.city, t.temperature, h.humidity
> FROM temperature AS t
> LEFT JOIN LATERAL
> ( SELECT * FROM humidity
> WHERE city = t.city AND ts <= t.ts
> ORDER BY ts DESC LIMIT 1
> ) AS h ON TRUE
> WHERE t.ts < '2022-01-05';
> ```
>
> One can do `SELECT (the query above) ORDER BY random() LIMIT x` but
> this produces an inefficient plan. Alternatively one could create
> temporary tables using `CREATE TEMP TABLE ... AS SELECT * FROM tbl
> TABLESAMPLE BERNOULLI(20)` but this is inconvenient and would be
> suboptimal even if we supported global temporary tables.
>
> 1. Do you think there might be value in addressing this issue?
> 2. If yes, how would you suggest addressing it from the UI point of
> view - by adding a special syntax, some sort of aggregate function, or
> ...?
>
> --
> Best regards,
> Aleksander Alekseev
>
>
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniil Davydov 2025-05-15 05:39:38 [BUG] Skipped initialization of some xl_xact_parsed_prepare fields
Previous Message Aleksander Alekseev 2025-05-14 23:41:15 Should we optimize the `ORDER BY random() LIMIT x` case?