Re: Any better plan for this query?..

From: Kenneth Marshall <ktm(at)rice(dot)edu>
To: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Cc: Dimitri <dimitrik(dot)fr(at)gmail(dot)com>, PostgreSQL Performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Any better plan for this query?..
Date: 2009-05-06 12:58:55
Message-ID: 20090506125855.GG23405@it.is.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, May 06, 2009 at 04:01:03PM +0800, Craig Ringer wrote:
> Dimitri wrote:
>> Hi,
>> any idea if there is a more optimal execution plan possible for this
>> query:
>> select S.REF as stref, S.NAME as stnm, H.HORDER as hord, H.BEGIN_DATE as
>> hbeg,
>> H.END_DATE as hend, H.NOTE as hnote
>> from HISTORY H, STAT S
>> where S.REF = H.REF_STAT
>> and H.REF_OBJECT = '0000000001'
>> order by H.HORDER ;
>
> OK, so you're taking a simple:
>
> history INNER JOIN stat ON (stat.ref = history.ref_stat)
>
> then filtering for records with a particular value of history.ref_object
> and finally performing a sort.
>
> If I'm reading it right, the plan below does a sequential scan on the
> `stat' table. The stat table only has 1000 rows, so this isn't necessarily
> an unreasonable choice even if there is an appropriate index and even if
> not many of the rows will be needed.
>
> It then does an index scan of the history table looking for tuples with
> ref_object = '0000000001' (text match). It hash joins the hashed results of
> the initial seq scan to the results of the index scan, and sorts the
> result.
>
> To me, that looks pretty reasonable. You might be able to avoid the hash
> join in favour of a nested loop scan of stat_ref_idx (looping over records
> from history.ref_stat where ref_object = '00000000001') by providing a
> composite index on HISTORY(ref_stat, ref_object). I'm really not too sure,
> though; plan optimization isn't my thing, I'm just seeing if I can offer a
> few ideas.
>
>> Table definitions:
>
> While not strictly necessary, it's a *REALLY* good idea to define a
> suitable PRIMARY KEY.
>
> Also, the `CHAR(n)' data type is evil. E.V.I.L. Use `varchar(n)' for
> bounded-length values, or `text' for unbounded fields, unless you REALLY
> want the crazy behaviour of `CHAR(n)'.
>
> I'm a little bit puzzled about why you seem to be doing lots of things with
> integer values stored in text strings, but that probably doesn't matter too
> much for the issue at hand.
>
>> NOTE: The same query runs 2 times faster on MySQL.
>
> With InnoDB tables and proper transactional safety? Or using scary MyISAM
> tables and a "just pray" approach to data integrity? If you're using MyISAM
> tables I'm not surprised; MySQL with MyISAM is stunningly fast, but
> oh-my-god dangerous.
>
> --
> Craig Ringer
>
I just thought I would ask. Are you using the query cache in MySQL?
If that is on, that could be the difference. Another thing to check,
try issuing the selects concurrently: 2 at a time, 5 at a time, 10
at a time... and see if that has an effect on timing. In many of the
benchmarks, MySQL will out perform PostgreSQL for very low numbers of
clients. Once you are using more than a handful, PostgreSQL pulls
ahead. Also, is this a completely static table? i.e. no updates or
inserts. How is the performance with those happening? This should
help you get a clearer picture of the performance.

My two cents.
Ken

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Kenneth Marshall 2009-05-06 13:05:10 Re: Any better plan for this query?..
Previous Message Richard Huxton 2009-05-06 12:53:45 Re: Any better plan for this query?..