From: | Karsten P <mr(dot)mister123(at)hotmail(dot)com> |
---|---|
To: | pgsql-general(at)lists(dot)postgresql(dot)org |
Subject: | Re: Feature-Request: Performance-Optimization when using limit in combination with order by on querys using union |
Date: | 2025-05-08 10:57:40 |
Message-ID: | DB7PR08MB30819C2279665D7AFBD247EFDA8BA@DB7PR08MB3081.eurprd08.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Okay, forget what i've just written, sorry for that.
I've digged deeper and checked my generic example - it works perfectly using
a merge-append in combination with limit.
i'm sorry i didn't check that first. it just won't work in my real-life
example.
though each part of the query is using an index-scan it is than using a
'normal' append
instead of a merge-append, but i don't know why.
here is the expected query plan as used with my generic example:
Limit (cost=0.86..0.90 rows=1 width=12)
-> Merge Append (cost=0.86..104314.37 rows=3435900 width=12)
Sort Key: kpkp_orders.buchungsdatum DESC
-> Index Only Scan Backward using kpkp_orders_1 on
kpkp_orders (cost=0.43..34977.68 rows=1717950 width=12)
-> Index Only Scan Backward using kpkp_invoices_1 on
kpkp_invoices (cost=0.43..34977.68 rows=1717950 width=12)
this is fast.
and here is the query plan used on my real-life example:
Limit (cost=918.40..918.41 rows=1 width=8)
-> Sort (cost=918.40..919.59 rows=473 width=8)
Sort Key: s.belegdatum DESC
-> Subquery Scan on s (cost=0.56..916.04 rows=473 width=8)
-> Append (cost=0.56..911.31 rows=473 width=327)
-> Subquery Scan on "*SELECT* 1"
(cost=0.56..624.08 rows=318 width=187)
-> Index Scan using
soit_erpprozesshistorie_12 on erpprozesshistorie eph (cost=0.56..620.10
rows=318 width=181)
Index Cond: ((clientid = '0'::numeric)
AND (activeflag = 't'::bpchar) AND (prozessuntertyp = 2) AND (CASE WHEN
((typ = 1) OR (typ = 7)) THEN 1 ELSE CASE WHEN (typ = 3) THEN 2 ELSE
CASE WHEN ((typ = 4) OR (typ = 9)) THEN 3 ELSE 4 END END END = 1) AND
(CASE WHEN (prozesstyp = 1) THEN true ELSE false END = true) AND
(artikelvariante_objid = '1064748816'::numeric))
-> Subquery Scan on "*SELECT* 2"
(cost=0.43..284.87 rows=155 width=214)
-> Index Scan using soit_umsatz_alt_08 on
umsatz_alt (cost=0.43..282.93 rows=155 width=186)
Index Cond: ((clientid = '0'::numeric)
AND (activeflag = 't'::bpchar) AND (verkauf = true) AND
(artikelvariante_objid = '1064748816'::numeric))
so my question is: under wich circumstance does the query-planner use or
prefer the 'merge append' over 'append'?
Thanks in advance!
Am 08.05.25 um 11:57 schrieb Karsten P:
> Hi,
>
> i've already googled so far but didn't find anything regarding my
> problem..
> I hope i'm here at the right place.
>
> Following situation (this is just an simplyfied example):
>
> suppose we have two tables, lets say
>
> orders
> - column 'order_number' -> varchar
> - column 'order_date' -> timestamp
>
> with index on order_date
>
> and
>
> invoices
> - column 'invoice_number' -> varchar
> - column 'invoice_date' -> timestamp
>
> with index on invoice_date
>
> and many records in both if them.
>
> now we have a view combining both of them as
>
> create view documents as
> (
> select order_number as document_number, order_date as
> document_date from orders
> union all select invoice_number, invoice_date from invoices
> )
>
>
> finding the last order placed in the database ist really easy:
>
> select order_number from orders order by order_date desc limit 1
>
> will result in an index scan backward on orders
>
> same with invoices only...
>
> but when querying the view
>
> select document_number from documents order by document_date desc
> limit 1
>
> seems to break down to
> - collect all rows from orders
> - combine it with all rows from invoices
> - sort all rows (descending)
> - limit to one row
>
> with many data this is quite slow.
>
> I've tested this with PG9.6 and PG14, it doesn't seem to make a
> difference (correct me if i'm wrong).
>
>
> So my question is: What about optimizing the query-planner that if
>
> - a query with unions of selects is executed
> - and an 'order by' in combination with 'limit' is applied on the
> complete query (not only on subselects)
> - and there is a matching index for each select
>
> the order by and limit - part of the sql is also beeing applied on
> each sub-select ?
>
> actually
> select document_number from documents order by document_date desc
> limit 1
>
> is beeing processed as
> select order_number from orders
> union all select invoice_number from invoices
> order by document_number desc
> limit 1
>
> but would it be possible to let the query-optimizer expand the query to
> select order_number from (
> (select order_number, order_date from orders order by
> order_date desc limit 1)
> union all (select invoice_number, invoice_date from invoices
> order by invoice_date desc limit 1)
> ) as subselect
> order by order_date desc
> limit 1
>
> as this would use two (or number of unions) index-backward-scans
> and than only has to reorder at maximum two rows before limiting to
> the first of it?
>
> this should be significantly faster.
>
> thanks a lot and greetz,
> Karsten
>
>
>
From | Date | Subject | |
---|---|---|---|
Next Message | KK CHN | 2025-05-08 11:16:13 | Re: Pgbackrest failure for INCR and DIFF but not FULL backup |
Previous Message | Karsten P | 2025-05-08 09:57:09 | Feature-Request: Performance-Optimization when using limit in combination with order by on querys using union |