Re: Erronous sort used in query plan

From: Shane Ambler <pgsql(at)007Marketing(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Erronous sort used in query plan
Date: 2007-01-07 21:04:44
Message-ID: 45A1606C.1080306@007Marketing.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:
> Shane Ambler <pgsql(at)007Marketing(dot)com> writes:
>> I am putting together searches on the catalog info and came up with a
>> select that was rather slow and I noticed that in the explain analyze
>> there is a sort step on one of the left joins which I don't think
>> belongs there.
>
> Well, it's certainly necessary in context because it's preparing the
> data for the merge join immediately above it. The correct question
> is why is the thing using a merge join here, when a hash join would be
> cheaper?
>
> I dug through this and found out that the hash join is estimated as
> cheaper, right up till the last step of cost_hashjoin:
>
> /*
> * Bias against putting larger relation on inside. We don't want an
> * absolute prohibition, though, since larger relation might have better
> * bucketsize --- and we can't trust the size estimates unreservedly,
> * anyway. Instead, inflate the run cost by the square root of the size
> * ratio. (Why square root? No real good reason, but it seems
> * reasonable...)
> *
> * Note: before 7.4 we implemented this by inflating startup cost; but if
> * there's a disable_cost component in the input paths' startup cost, that
> * unfairly penalizes the hash. Probably it'd be better to keep track of
> * disable penalty separately from cost.
> */
> if (innerbytes > outerbytes && outerbytes > 0)
> run_cost *= sqrt(innerbytes / outerbytes);
>
> In this example, the data volume from the join of everything else is
> estimated as less than what needs to be fetched from pg_proc, and so
> this bias kicks in, and the cost estimate roughly doubles.
> Unfortunately, because it's a LEFT JOIN, we'll never consider hashjoin
> in the other direction and so the hash loses out to the mergejoin.
>
> It seems clear to me that we ought not impose a bias unless the join
> type is such that both directions of hashing are feasible. I wonder
> also if the bias is too large ... but there's not really evidence for
> or against that in this example. The point is that this code implicitly
> assumes both directions will be tried, and they won't.
>

I think that the selected sort (or at least the merge join) is incorrect
- the column sorted (or both actually) is linking the current record in
pg_operator with the oid in the pg_proc - it will only return one row.

If one of the pg_type joins is changed, it then sorts on the oid of
pg_operator as the foreign table - again this will only return one row.

I would think that the foreign oid would indicate to the planner that it
will only find one foreign row to link with.

I can see that the error I made created a funny (probably useless
actually) link that would throw things out, but I would expect it to
create bad planning for the two joins that are in error not a
non-related one to one join. If a sort/merge join was created from this
and used to satisfy this join I would accept that as part of what I
unintentionally requested but instead it generates a sort/merge join on
a join that links one current record to one foreign record and has
nothing in common with the joins in error.

--

Shane Ambler
pgSQL(at)007Marketing(dot)com

Get Sheeky @ http://Sheeky.Biz

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2007-01-07 21:12:59 Re: --with-libxml build failures on OS X
Previous Message Gregory Stark 2007-01-07 19:43:30 Full page writes