Quick Links

Feature-Request: Performance-Optimization when using limit in combination with order by on querys using union

From:	Karsten P <mr(dot)mister123(at)hotmail(dot)com>
To:	pgsql-general(at)lists(dot)postgresql(dot)org
Subject:	Feature-Request: Performance-Optimization when using limit in combination with order by on querys using union
Date:	2025-05-08 09:57:09
Message-ID:	DB7PR08MB3081636A27C077E3BB4DA2DCDA8BA@DB7PR08MB3081.eurprd08.prod.outlook.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

Hi,

i've already googled so far but didn't find anything regarding my problem..
I hope i'm here at the right place.

Following situation (this is just an simplyfied example):

suppose we have two tables, lets say

orders
- column 'order_number' -> varchar
- column 'order_date' -> timestamp

with index on order_date

and

invoices
- column 'invoice_number' -> varchar
- column 'invoice_date' -> timestamp

with index on invoice_date

and many records in both if them.

now we have a view combining both of them as

create view documents as
(
select order_number as document_number, order_date as document_date
from orders
union all select invoice_number, invoice_date from invoices
)

finding the last order placed in the database ist really easy:

select order_number from orders order by order_date desc limit 1

will result in an index scan backward on orders

same with invoices only...

but when querying the view

select document_number from documents order by document_date desc limit 1

seems to break down to
- collect all rows from orders
- combine it with all rows from invoices
- sort all rows (descending)
- limit to one row

with many data this is quite slow.

I've tested this with PG9.6 and PG14, it doesn't seem to make a
difference (correct me if i'm wrong).

So my question is: What about optimizing the query-planner that if

- a query with unions of selects is executed
- and an 'order by' in combination with 'limit' is applied on the
complete query (not only on subselects)
- and there is a matching index for each select

the order by and limit - part of the sql is also beeing applied on each
sub-select ?

actually
select document_number from documents order by document_date desc
limit 1

is beeing processed as
    select order_number from orders
    union all select invoice_number from invoices
    order by document_number desc
    limit 1

but would it be possible to let the query-optimizer expand the query to
    select order_number from (
        (select order_number, order_date from orders order by
order_date desc limit 1)
        union all (select invoice_number, invoice_date from invoices
order by invoice_date desc limit 1)
    ) as subselect
   order by order_date desc
   limit 1

as this would use two (or number of unions) index-backward-scans
and than only has to reorder at maximum two rows before limiting to the
first of it?

this should be significantly faster.

thanks a lot and greetz,
Karsten

Responses

Re: Feature-Request: Performance-Optimization when using limit in combination with order by on querys using union at 2025-05-08 10:57:40 from Karsten P

Browse pgsql-general by date

	From	Date	Subject
Next Message	Karsten P	2025-05-08 10:57:40	Re: Feature-Request: Performance-Optimization when using limit in combination with order by on querys using union
Previous Message	Luca Ferrari	2025-05-08 06:54:30	Re: pg_rewind problem: cannot find WAL