Re: Slow queries consisting inner selects and order bys & hack to speed up

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Ümit Öztosun <umit(at)likyabilisim(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Slow queries consisting inner selects and order bys & hack to speed up
Date: 2006-01-23 19:19:14
Message-ID: 7599.1138043954@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

=?ISO-8859-1?Q?=DCmit_=D6ztosun?= <umit(at)likyabilisim(dot)com> writes:
> Our application uses typical queries similar to following (very simplified):

> SELECT
> part_id,
> part_name,
> (SELECT
> SUM(amount) FROM part_movements M
> WHERE P.part_id =3D M.part_id
> ) as part_amount
> FROM parts P
> ORDER BY part_name
> LIMIT 50

> Postgres seems to compute all possible rows and then sorts the
> results, which nearly renders the paging meaningless.

Yeah. The general rule is that sorting happens after computing the
SELECT values --- this is more or less required for cases where the
ORDER BY refers to a SELECT-list item. You'd probably have better
results by writing a sub-select:

SELECT
part_id,
part_name,
(SELECT
SUM(amount) FROM part_movements M
WHERE P.part_id = M.part_id
) as part_amount
FROM
(SELECT part_id, part_name FROM parts P
WHERE whatever ORDER BY whatever LIMIT n) as P;

This will do the part_movements stuff only for rows that make it out of
the sub-select.

Another approach is to make sure the ORDER BY is always on an indexed
column; in cases where the ORDER BY is done by an indexscan instead
of a sort, calculation of the unwanted SELECT-list items does not
happen. However, this only works as long as LIMIT+OFFSET is fairly
small.

Lastly, are you on a reasonably current Postgres version (performance
complaints about anything older than 8.0 will no longer be accepted
with much grace), and are your statistics up to date? The ANALYZE
shows rowcount estimates that seem suspiciously far off:

-> Seq Scan on scf_stokkart stok (cost=0.00..142622.54 rows=25 width=503) (actual time=4.726..19324.633 rows=4947 loops=1)
Filter: (upper((durum)::text) = 'A'::text)

This is important because, if the planner realized that the SELECT-list
items were going to be evaluated 5000 times not 25, it might well choose
a different plan.

regards, tom lane

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Adnan HOTMAIL 2006-01-23 22:05:38 unsubscribe
Previous Message Frank Wiles 2006-01-23 17:25:26 Re: [PERFORMANCE] Stored Procedures