Re: Feature-Request: Performance-Optimization when using limit in combination with order by on querys using union

From: Karsten P <mr(dot)mister123(at)hotmail(dot)com>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Feature-Request: Performance-Optimization when using limit in combination with order by on querys using union
Date: 2025-05-08 10:57:40
Message-ID: DB7PR08MB30819C2279665D7AFBD247EFDA8BA@DB7PR08MB3081.eurprd08.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Okay, forget what i've just written, sorry for that.

I've digged deeper and checked my generic example - it works perfectly using
a merge-append in combination with limit.

i'm sorry i didn't check that first. it just won't work in my real-life
example.
though each part of the query is using an index-scan it is than using a
'normal' append
instead of a merge-append, but i don't know why.

here is the expected query plan as used with my generic example:

Limit  (cost=0.86..0.90 rows=1 width=12)
  ->  Merge Append  (cost=0.86..104314.37 rows=3435900 width=12)
        Sort Key: kpkp_orders.buchungsdatum DESC
        ->  Index Only Scan Backward using kpkp_orders_1 on
kpkp_orders  (cost=0.43..34977.68 rows=1717950 width=12)
        ->  Index Only Scan Backward using kpkp_invoices_1 on
kpkp_invoices  (cost=0.43..34977.68 rows=1717950 width=12)

this is fast.

and here is the query plan used on my real-life example:

Limit  (cost=918.40..918.41 rows=1 width=8)
  ->  Sort  (cost=918.40..919.59 rows=473 width=8)
        Sort Key: s.belegdatum DESC
        ->  Subquery Scan on s  (cost=0.56..916.04 rows=473 width=8)
              ->  Append  (cost=0.56..911.31 rows=473 width=327)
                    ->  Subquery Scan on "*SELECT* 1"
(cost=0.56..624.08 rows=318 width=187)
                          ->  Index Scan using
soit_erpprozesshistorie_12 on erpprozesshistorie eph (cost=0.56..620.10
rows=318 width=181)
                                Index Cond: ((clientid = '0'::numeric)
AND (activeflag = 't'::bpchar) AND (prozessuntertyp = 2) AND (CASE WHEN
((typ = 1) OR (typ = 7)) THEN 1 ELSE CASE WHEN (typ = 3) THEN 2 ELSE
CASE WHEN ((typ = 4) OR (typ = 9)) THEN 3 ELSE 4 END END END = 1) AND
(CASE WHEN (prozesstyp = 1) THEN true ELSE false END = true) AND
(artikelvariante_objid = '1064748816'::numeric))
                    ->  Subquery Scan on "*SELECT* 2"
(cost=0.43..284.87 rows=155 width=214)
                          ->  Index Scan using soit_umsatz_alt_08 on
umsatz_alt  (cost=0.43..282.93 rows=155 width=186)
                                Index Cond: ((clientid = '0'::numeric)
AND (activeflag = 't'::bpchar) AND (verkauf = true) AND
(artikelvariante_objid = '1064748816'::numeric))

so my question is: under wich circumstance does the query-planner use or
prefer the 'merge append' over 'append'?

Thanks in advance!

Am 08.05.25 um 11:57 schrieb Karsten P:
> Hi,
>
> i've already googled so far but didn't find anything regarding my
> problem..
> I hope i'm here at the right place.
>
> Following situation (this is just an simplyfied example):
>
> suppose we have two tables, lets say
>
> orders
>   - column 'order_number' -> varchar
>   - column 'order_date' -> timestamp
>
> with index on order_date
>
> and
>
> invoices
>   - column 'invoice_number' -> varchar
>   - column 'invoice_date' -> timestamp
>
> with index on invoice_date
>
> and many records in both if them.
>
> now we have a view combining both of them as
>
> create view documents as
> (
>     select order_number as document_number, order_date as
> document_date from orders
>     union all select invoice_number, invoice_date from invoices
> )
>
>
> finding the last order placed in the database ist really easy:
>
>   select order_number from orders order by order_date desc limit 1
>
> will result in an index scan backward on orders
>
> same with invoices only...
>
> but when querying the view
>
>   select document_number from documents order by document_date desc
> limit 1
>
> seems to break down to
>   - collect all rows from orders
>   - combine it with all rows from invoices
>   - sort all rows (descending)
>   - limit to one row
>
> with many data this is quite slow.
>
> I've tested this with PG9.6 and PG14, it doesn't seem to make a
> difference (correct me if i'm wrong).
>
>
> So my question is: What about optimizing the query-planner that if
>
> - a query with unions of selects is executed
> - and an 'order by' in combination with 'limit' is applied on the
> complete query (not only on subselects)
> - and there is a matching index for each select
>
> the order by and limit - part of the sql is also beeing applied on
> each sub-select ?
>
> actually
>     select document_number from documents order by document_date desc
> limit 1
>
> is beeing processed as
>     select order_number from orders
>     union all select invoice_number from invoices
>     order by document_number desc
>     limit 1
>
> but would it be possible to let the query-optimizer expand the query to
>     select order_number from (
>         (select order_number, order_date from orders order by
> order_date desc limit 1)
>         union all (select invoice_number, invoice_date from invoices
> order by invoice_date desc limit 1)
>     ) as subselect
>    order by order_date desc
>    limit 1
>
> as this would use two (or number of unions) index-backward-scans
> and than only has to reorder at maximum two rows before limiting to
> the first of it?
>
> this should be significantly faster.
>
> thanks a lot and greetz,
> Karsten
>
>
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message KK CHN 2025-05-08 11:16:13 Re: Pgbackrest failure for INCR and DIFF but not FULL backup
Previous Message Karsten P 2025-05-08 09:57:09 Feature-Request: Performance-Optimization when using limit in combination with order by on querys using union