Quick Links

Re: Erronous sort used in query plan

From:	Shane Ambler <pgsql(at)007Marketing(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Erronous sort used in query plan
Date:	2007-01-07 21:04:44
Message-ID:	45A1606C.1080306@007Marketing.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Tom Lane wrote:
> Shane Ambler <pgsql(at)007Marketing(dot)com> writes:
>> I am putting together searches on the catalog info and came up with a
>> select that was rather slow and I noticed that in the explain analyze
>> there is a sort step on one of the left joins which I don't think
>> belongs there.
>
> Well, it's certainly necessary in context because it's preparing the
> data for the merge join immediately above it. The correct question
> is why is the thing using a merge join here, when a hash join would be
> cheaper?
>
> I dug through this and found out that the hash join is estimated as
> cheaper, right up till the last step of cost_hashjoin:
>
> /*
> * Bias against putting larger relation on inside. We don't want an
> * absolute prohibition, though, since larger relation might have better
> * bucketsize --- and we can't trust the size estimates unreservedly,
> * anyway. Instead, inflate the run cost by the square root of the size
> * ratio. (Why square root? No real good reason, but it seems
> * reasonable...)
> *
> * Note: before 7.4 we implemented this by inflating startup cost; but if
> * there's a disable_cost component in the input paths' startup cost, that
> * unfairly penalizes the hash. Probably it'd be better to keep track of
> * disable penalty separately from cost.
> */
> if (innerbytes > outerbytes && outerbytes > 0)
> run_cost *= sqrt(innerbytes / outerbytes);
>
> In this example, the data volume from the join of everything else is
> estimated as less than what needs to be fetched from pg_proc, and so
> this bias kicks in, and the cost estimate roughly doubles.
> Unfortunately, because it's a LEFT JOIN, we'll never consider hashjoin
> in the other direction and so the hash loses out to the mergejoin.
>
> It seems clear to me that we ought not impose a bias unless the join
> type is such that both directions of hashing are feasible. I wonder
> also if the bias is too large ... but there's not really evidence for
> or against that in this example. The point is that this code implicitly
> assumes both directions will be tried, and they won't.
>

I think that the selected sort (or at least the merge join) is incorrect
- the column sorted (or both actually) is linking the current record in
pg_operator with the oid in the pg_proc - it will only return one row.

If one of the pg_type joins is changed, it then sorts on the oid of
pg_operator as the foreign table - again this will only return one row.

I would think that the foreign oid would indicate to the planner that it
will only find one foreign row to link with.

I can see that the error I made created a funny (probably useless
actually) link that would throw things out, but I would expect it to
create bad planning for the two joins that are in error not a
non-related one to one join. If a sort/merge join was created from this
and used to satisfy this join I would accept that as part of what I
unintentionally requested but instead it generates a sort/merge join on
a join that links one current record to one foreign record and has
nothing in common with the joins in error.

Shane Ambler
pgSQL(at)007Marketing(dot)com

Get Sheeky @ http://Sheeky.Biz

In response to

Re: Erronous sort used in query plan at 2007-01-07 18:28:58 from Tom Lane

Responses

Re: Erronous sort used in query plan at 2007-01-08 01:51:22 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Peter Eisentraut	2007-01-07 21:12:59	Re: --with-libxml build failures on OS X
Previous Message	Gregory Stark	2007-01-07 19:43:30	Full page writes