Re: hashjoin chosen over 1000x faster plan

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: hashjoin chosen over 1000x faster plan
Date: 2007-10-10 17:05:55
Message-ID: 1192035955.4233.306.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, 2007-10-10 at 09:15 -0500, Kevin Grittner wrote:
> >>> On Wed, Oct 10, 2007 at 1:31 AM, in message
> <1191997904(dot)4233(dot)125(dot)camel(at)ebony(dot)site>, Simon Riggs <simon(at)2ndquadrant(dot)com>
> wrote:
> > On Tue, 2007-10-09 at 15:09 -0500, Kevin Grittner wrote:
> >
> >> I have a situation where a query is running much slower than I would
> >> expect. The ANALYZE showed that it is hashing some information which
> >> is rarely needed. When I set enable_hashjoin = off for the
> >> connection the query run in 1/1000 the time.
> >
> > Can you confirm the two queries give identical outputs?
>
> I checked; the output is identical.
>
> > It isn't clear
> > to me why the second sort is (never executed) in your second plan, which
> > I would only expect to see for an inner merge join.
>
> I assume that is because there were no rows to sort. The
> CaseTypeHistEvent view is only needed if there is a link to an event
> which reopens the charge after it is disposed. This only happens for
> about 1% of the Charge records.

So CHST.EventType is mostly NULL? So the good news is that the default
plan works best when it does actually find a match. So for 1% of cases
you will have an execution time of about 1s, <1ms for the others if you
fiddle with the planner methods.

The planner thinks every row will find a match, yet the actual number is
only 1%. Hmmm, same section of code as last week.

Basically the planner doesn't ever optimise for the possibility of the
never-executed case because even a single row returned would destroy
that assumption.

If we had an Option node in there, we could run the first part of the
plan before deciding whether to do an MJ or an HJ. Doing that would
avoid doing 2 sorts and return even quicker in the common case (about
80% time) without being slower in the slowest.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Josh Trutwin 2007-10-10 17:25:31 Re: SQL Monitoring
Previous Message Scott Marlowe 2007-10-10 15:20:02 Re: Shared Buffer setting in postgresql.conf