Query does not return rows unless a LIMIT statement is used.

From: Richard Kut <rkut(at)intelerad(dot)com>
To: pgsql-novice(at)postgresql(dot)org
Cc: Mathew Kanner <Mathew(dot)Kanner(at)intelerad(dot)com>
Subject: Query does not return rows unless a LIMIT statement is used.
Date: 2006-02-15 15:06:49
Message-ID: 200602151006.50419.rkut@intelerad.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

Hello!

This is my first time using Postgres, and my first posting to this mailing
list.

I am evaluating the performance of Postgres version 8.1.2 as a possible
replacement for our aging Sybase ASE version 11 database. However, I have run
into what could be a show-stopper problem.

I am having a problem with a query taking a very long time to run. So long,
in fact, that I have yet to see that query return any rows.

I will apologize now for what may become a very long message, but I want to
be sure that I pass along all of the pertinent details. Here we go.

I am using Red Hat Enterprise Linux 3.0 on a box with 1.5 GB of RAM. The CPU
is an Intel Xeon 2.4 GHz hyper-threaded chip with 512 KB cache.

I am trying to do a simple join on indexed columns between two tables. Table
PID has 328,403 records, and table PV1 has 1,159,822 records. The query which
I am using is:

SELECT *
FROM pid, pv1
WHERE pid.patient_id_internal_id = pv1.patient_id_internal_id
LIMIT 5324;

For table PID, the column patient_id_internal_id is defined as character
varying(20). The index on this column is defined as:
"kdx_pid" btree (patient_id_internal_id), tablespace "hl7_segments_ihe_idx"

For table PV1, the column patient_id_internal_id is defined as character
varying(20). The index on this column is defined as:
"patient_id_internal_id_idx" btree (patient_id_internal_id), tablespace
"hl7_segments_ihe_idx"

Both tables live in the tablespace hl7_segments_ihe_data.

Note the LIMIT 5324 statement. I empirically determined that this magic
number is the maximum row count before the query performance degrades
completely. It appears to be a sudden performance degradation. If I increase
the value in LIMIT beyond 5324, or if I remove the LIMIT altogether, then the
query performance is horrible.

Here is the query plan with the LIMIT statement:

QUERY PLAN
---------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..1607.27 rows=5324 width=691)
-> Merge Join (cost=0.00..350141.38 rows=1159822 width=691)
Merge Cond: (("outer".patient_id_internal_id)::text =
("inner".patient_id_internal_id)::text)
-> Index Scan using kdx_pid on pid (cost=0.00..49229.38 rows=328403
width=284)
-> Index Scan using patient_id_internal_id_idx on pv1
(cost=0.00..290598.21 rows=1159822 width=407)
(5 rows)

Time: 1.472 ms

Now here is the query plan without the LIMIT statement:

QUERY PLAN
---------------------------------------------------------------------------------------------------------
Merge Join (cost=0.00..350141.38 rows=1159822 width=691)
Merge Cond: (("outer".patient_id_internal_id)::text =
("inner".patient_id_internal_id)::text)
-> Index Scan using kdx_pid on pid (cost=0.00..49229.38 rows=328403
width=284)
-> Index Scan using patient_id_internal_id_idx on pv1
(cost=0.00..290598.21 rows=1159822 width=407)
(4 rows)

Time: 1.432 ms

I have tried various combinations of SET enable_seqscan = off;, SET
enable_mergejoin = off;, and so forth. All to no avail. The bottom line is
still that LIMIT statement. Without it the query performance is awful.

From observation using the TOP utility, I have noticed that when the query is
running without the LIMIT statement, that the postmaster process resident
memory size is increasing at a steady rate. It would seem that the query
results are being gathered in memory first before being displayed in PSQL. Is
there a way around this?

After loading the tables with data I did perform a VACUUM ANALYZE and a
database restart prior to any querying..

I have also modified the following parameters in the postgres.conf file, and
then restarted the database:

shared_buffers = 16384
commit_delay = 100000
commit_siblings = 5
enable_seqscan = off
effective_cache_size = 100000
log_destination = 'stderr'
redirect_stderr = on
log_directory = 'pg_log'
log_truncate_on_rotation = on
log_rotation_age = 1440
log_rotation_size = 200
lc_messages = 'en_US'
lc_monetary = 'en_US'
lc_numeric = 'en_US'
lc_time = 'en_US'

What am I doing wrong? Has anyone run into something similar? Any help would
be greatly appreciated. Thank you in advance.

--
Regards,

Richard Kut
Database Administrator
Research & Development
Intelerad Medical Systems Inc.
460 Ste-Catherine West, Suite 210
Montreal, Quebec, Canada H3B 1A7
Tel: 514.931.6222 x7733
Fax: 514.931.4653
rkut(at)intelerad(dot)com
www.intelerad.com

This email or any attachments may contain confidential or legally
privileged information intended for the sole use of the addressees. Any
use, redistribution, disclosure, or reproduction of this information,
except as intended, is prohibited. If you received this
email in error, please notify the sender and remove all copies of the
message, including any attachments.

Responses

Browse pgsql-novice by date

  From Date Subject
Next Message Ketema Harris 2006-02-15 15:35:29 Indexed Views
Previous Message Luis Silva 2006-02-15 12:50:38 Re: returning Record type problem