Re: Query does not return rows unless a LIMIT statement is used.

From: Richard Kut <rkut(at)intelerad(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-novice(at)postgresql(dot)org, Mathew Kanner <Mathew(dot)Kanner(at)intelerad(dot)com>
Subject: Re: Query does not return rows unless a LIMIT statement is used.
Date: 2006-02-15 16:11:36
Message-ID: 200602151111.37078.rkut@intelerad.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

Hi Tom!

Thanks for the quick response! Here are the query plans for both queries.

EXPLAIN ANALYZE SELECT *
FROM pid, pv1
WHERE pid.patient_id_internal_id = pv1.patient_id_internal_id
LIMIT 5324;

QUERY
PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..33408.19 rows=5324 width=691) (actual time=0.054..106.972
rows=5324 loops=1)
-> Nested Loop (cost=0.00..7277902.27 rows=1159822 width=691) (actual
time=0.051..99.230 rows=5324 loops=1)
-> Index Scan using patient_id_internal_id_idx on pv1
(cost=0.00..290598.21 rows=1159822 width=407) (actual time=0.020..14.995
rows=5324 loops=1)
-> Index Scan using kdx_pid on pid (cost=0.00..6.01 rows=1
width=284) (actual time=0.009..0.010 rows=1 loops=5324)
Index Cond: ((pid.patient_id_internal_id)::text =
("outer".patient_id_internal_id)::text)
Total runtime: 111.137 ms
(6 rows)

Time: 112.405 ms

EXPLAIN ANALYZE SELECT *
FROM pid, pv1
WHERE pid.patient_id_internal_id = pv1.patient_id_internal_id;

QUERY
PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=62558.39..509419.48 rows=1159822 width=691) (actual
time=3348.848..163051.064 rows=1159805 loops=1)
Hash Cond: (("outer".patient_id_internal_id)::text =
("inner".patient_id_internal_id)::text)
-> Index Scan using patient_id_internal_id_idx on pv1
(cost=0.00..290598.21 rows=1159822 width=407) (actual time=0.020..56378.131
rows=1159822 loops=1)
-> Hash (cost=49229.38..49229.38 rows=328403 width=284) (actual
time=3342.874..3342.874 rows=328403 loops=1)
-> Index Scan using kdx_pid on pid (cost=0.00..49229.38 rows=328403
width=284) (actual time=0.010..2110.172 rows=328403 loops=1)
Total runtime: 163917.488 ms
(6 rows)

Time: 164006.053 ms

I hope that this helps.

On Wednesday 15 February 2006 10:53, Tom Lane wrote:
> Richard Kut <rkut(at)intelerad(dot)com> writes:
> > Note the LIMIT 5324 statement. I empirically determined that this magic
> > number is the maximum row count before the query performance degrades
> > completely. It appears to be a sudden performance degradation. If I
> > increase the value in LIMIT beyond 5324, or if I remove the LIMIT
> > altogether, then the query performance is horrible.
>
> This is really, really hard to believe considering that you're showing
> the exact same query plan in both cases. Are you certain you didn't
> get confused about which EXPLAIN output was which? It's entirely
> possible for the planner to pick a different underlying plan depending
> on the LIMIT setting --- small LIMIT values will make it tend to prefer
> fast-start query plans, since a slow-overall plan with nil startup cost
> can beat a fast-overall plan with high startup cost if you're not
> intending to run it to completion. Up until I got to the query plans
> I thought you were describing a situation where the planner had switched
> over to a high-startup-cost plan much sooner than it should do, but your
> EXPLAIN printouts don't reflect any such thing.
>
> If it is a question of bad choice of query plan, the answer is probably
> that you need to VACUUM and/or ANALYZE the tables involved so that the
> planner has more accurate statistics to work with. Do the estimated
> row counts shown by EXPLAIN look reasonable at all?
>
> > From observation using the TOP utility, I have noticed that when the
> > query is running without the LIMIT statement, that the postmaster process
> > resident memory size is increasing at a steady rate. It would seem that
> > the query results are being gathered in memory first before being
> > displayed in PSQL. Is there a way around this?
>
> The postmaster doesn't gather up query results, but psql itself will.
> If you want incremental fetch of results, use a cursor and FETCH some
> appropriate number of rows at a time.
>
> > commit_delay = 100000
>
> This is unlikely to be a good idea. Your other parameter settings look
> reasonable though.
>
> regards, tom lane

--
Regards,

Richard Kut
Database Administrator
Research & Development
Intelerad Medical Systems Inc.
460 Ste-Catherine West, Suite 210
Montreal, Quebec, Canada H3B 1A7
Tel: 514.931.6222 x7733
Fax: 514.931.4653
rkut(at)intelerad(dot)com
www.intelerad.com

This email or any attachments may contain confidential or legally
privileged information intended for the sole use of the addressees. Any
use, redistribution, disclosure, or reproduction of this information,
except as intended, is prohibited. If you received this
email in error, please notify the sender and remove all copies of the
message, including any attachments.

In response to

Responses

Browse pgsql-novice by date

  From Date Subject
Next Message Tom Lane 2006-02-15 16:21:55 Re: Query does not return rows unless a LIMIT statement is used.
Previous Message Tom Lane 2006-02-15 15:53:11 Re: Query does not return rows unless a LIMIT statement is used.