Re: Any better plan for this query?..

From: Dimitri <dimitrik(dot)fr(at)gmail(dot)com>
To: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Cc: PostgreSQL Performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Any better plan for this query?..
Date: 2009-05-06 08:14:38
Message-ID: 5482c80a0905060114u1a3f52fbv3a05ad7c88e13101@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi Craig,

yes, you detailed very well the problem! :-)
all those CHAR columns are so just due historical issues :-) as well
they may contains anything else and not only numbers, that's why..
Also, all data inside are fixed, so VARCHAR will not save place, or
what kind of performance issue may we expect with CHAR vs VARCHAR if
all data have a fixed length?..

Any way to force nested loop without additional index?..

It's 2 times faster on InnoDB, and as it's just a SELECT query no need
to go in transaction details :-)

Rgds,
-Dimitri

On 5/6/09, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au> wrote:
> Dimitri wrote:
>> Hi,
>>
>> any idea if there is a more optimal execution plan possible for this
>> query:
>>
>> select S.REF as stref, S.NAME as stnm, H.HORDER as hord, H.BEGIN_DATE as
>> hbeg,
>> H.END_DATE as hend, H.NOTE as hnote
>> from HISTORY H, STAT S
>> where S.REF = H.REF_STAT
>> and H.REF_OBJECT = '0000000001'
>> order by H.HORDER ;
>
> OK, so you're taking a simple:
>
> history INNER JOIN stat ON (stat.ref = history.ref_stat)
>
> then filtering for records with a particular value of history.ref_object
> and finally performing a sort.
>
> If I'm reading it right, the plan below does a sequential scan on the
> `stat' table. The stat table only has 1000 rows, so this isn't
> necessarily an unreasonable choice even if there is an appropriate index
> and even if not many of the rows will be needed.
>
> It then does an index scan of the history table looking for tuples with
> ref_object = '0000000001' (text match). It hash joins the hashed results
> of the initial seq scan to the results of the index scan, and sorts the
> result.
>
> To me, that looks pretty reasonable. You might be able to avoid the hash
> join in favour of a nested loop scan of stat_ref_idx (looping over
> records from history.ref_stat where ref_object = '00000000001') by
> providing a composite index on HISTORY(ref_stat, ref_object). I'm really
> not too sure, though; plan optimization isn't my thing, I'm just seeing
> if I can offer a few ideas.
>
>> Table definitions:
>
> While not strictly necessary, it's a *REALLY* good idea to define a
> suitable PRIMARY KEY.
>
> Also, the `CHAR(n)' data type is evil. E.V.I.L. Use `varchar(n)' for
> bounded-length values, or `text' for unbounded fields, unless you REALLY
> want the crazy behaviour of `CHAR(n)'.
>
> I'm a little bit puzzled about why you seem to be doing lots of things
> with integer values stored in text strings, but that probably doesn't
> matter too much for the issue at hand.
>
>> NOTE: The same query runs 2 times faster on MySQL.
>
> With InnoDB tables and proper transactional safety? Or using scary
> MyISAM tables and a "just pray" approach to data integrity? If you're
> using MyISAM tables I'm not surprised; MySQL with MyISAM is stunningly
> fast, but oh-my-god dangerous.
>
> --
> Craig Ringer
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Heikki Linnakangas 2009-05-06 08:20:58 Re: Any better plan for this query?..
Previous Message Craig Ringer 2009-05-06 08:01:03 Re: Any better plan for this query?..