Re: Query does not return rows unless a LIMIT statement

From: Sean Davis <sdavis2(at)mail(dot)nih(dot)gov>
To: <rkut(at)intelerad(dot)com>, <pgsql-novice(at)postgresql(dot)org>
Cc: Mathew Kanner <Mathew(dot)Kanner(at)intelerad(dot)com>
Subject: Re: Query does not return rows unless a LIMIT statement
Date: 2006-02-15 15:52:09
Message-ID: C018B859.5D65%sdavis2@mail.nih.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

On 2/15/06 10:06 AM, "Richard Kut" <rkut(at)intelerad(dot)com> wrote:

> Hello!
>
> This is my first time using Postgres, and my first posting to this mailing
> list.
>
> I am evaluating the performance of Postgres version 8.1.2 as a possible
> replacement for our aging Sybase ASE version 11 database. However, I have run
> into what could be a show-stopper problem.
>
> I am having a problem with a query taking a very long time to run. So long,
> in fact, that I have yet to see that query return any rows.
>
> I will apologize now for what may become a very long message, but I want to
> be sure that I pass along all of the pertinent details. Here we go.
>
> I am using Red Hat Enterprise Linux 3.0 on a box with 1.5 GB of RAM. The CPU
> is an Intel Xeon 2.4 GHz hyper-threaded chip with 512 KB cache.
>
> I am trying to do a simple join on indexed columns between two tables. Table
> PID has 328,403 records, and table PV1 has 1,159,822 records. The query which
> I am using is:
>
> SELECT *
> FROM pid, pv1
> WHERE pid.patient_id_internal_id = pv1.patient_id_internal_id
> LIMIT 5324;
>
> For table PID, the column patient_id_internal_id is defined as character
> varying(20). The index on this column is defined as:
> "kdx_pid" btree (patient_id_internal_id), tablespace "hl7_segments_ihe_idx"
>
> For table PV1, the column patient_id_internal_id is defined as character
> varying(20). The index on this column is defined as:
> "patient_id_internal_id_idx" btree (patient_id_internal_id), tablespace
> "hl7_segments_ihe_idx"
>
> Both tables live in the tablespace hl7_segments_ihe_data.
>
> Note the LIMIT 5324 statement. I empirically determined that this magic
> number is the maximum row count before the query performance degrades
> completely. It appears to be a sudden performance degradation. If I increase
> the value in LIMIT beyond 5324, or if I remove the LIMIT altogether, then the
> query performance is horrible.

Can you send EXPLAIN ANALYZE output? That has timing and "real" numbers of
rows, etc.

> Here is the query plan with the LIMIT statement:
>
> QUERY PLAN
> ------------------------------------------------------------------------------
> ---------------------------------
> Limit (cost=0.00..1607.27 rows=5324 width=691)
> -> Merge Join (cost=0.00..350141.38 rows=1159822 width=691)
> Merge Cond: (("outer".patient_id_internal_id)::text =
> ("inner".patient_id_internal_id)::text)
> -> Index Scan using kdx_pid on pid (cost=0.00..49229.38 rows=328403
> width=284)
> -> Index Scan using patient_id_internal_id_idx on pv1
> (cost=0.00..290598.21 rows=1159822 width=407)
> (5 rows)
>
> Time: 1.472 ms
>
> Now here is the query plan without the LIMIT statement:
>
> QUERY PLAN
> ------------------------------------------------------------------------------
> ---------------------------
> Merge Join (cost=0.00..350141.38 rows=1159822 width=691)
> Merge Cond: (("outer".patient_id_internal_id)::text =
> ("inner".patient_id_internal_id)::text)
> -> Index Scan using kdx_pid on pid (cost=0.00..49229.38 rows=328403
> width=284)
> -> Index Scan using patient_id_internal_id_idx on pv1
> (cost=0.00..290598.21 rows=1159822 width=407)
> (4 rows)
>
> Time: 1.432 ms
>
> I have tried various combinations of SET enable_seqscan = off;, SET
> enable_mergejoin = off;, and so forth. All to no avail. The bottom line is
> still that LIMIT statement. Without it the query performance is awful.
>
> From observation using the TOP utility, I have noticed that when the query is
> running without the LIMIT statement, that the postmaster process resident
> memory size is increasing at a steady rate. It would seem that the query
> results are being gathered in memory first before being displayed in PSQL. Is
> there a way around this?

That is what is going on, I think.

> After loading the tables with data I did perform a VACUUM ANALYZE and a
> database restart prior to any querying..

In response to

Browse pgsql-novice by date

  From Date Subject
Next Message Tom Lane 2006-02-15 15:53:11 Re: Query does not return rows unless a LIMIT statement is used.
Previous Message Sean Davis 2006-02-15 15:48:07 Re: Indexed Views