Re: Proposed patch to make mergejoin cost estimation more symmetric

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-patches(at)postgreSQL(dot)org
Subject: Re: Proposed patch to make mergejoin cost estimation more symmetric
Date: 2007-12-07 08:29:35
Message-ID: 1197016175.4255.466.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-patches

On Thu, 2007-12-06 at 19:19 -0500, Tom Lane wrote:

> Since something needs to be done to that code anyway, I'm considering
> applying the attached patch. It's a bit larger than I would normally
> consider to be a good idea for an optional patch in late beta. But then
> again I wouldn't have the slightest hesitation about back-patching a
> change of this size after final release, so it seems attractive to put
> it in now.
>
> Aside from the patch, I have attached a test script that exercises merge
> join planning for some simple cases, and the plans output by the script
> in CVS HEAD with/without the patch. The cost estimates with the patch
> are in line with expectation, the estimates without, not so much.
> In particular, the existing bug can be seen at work here in that the
> sixth and eighth test cases ("big join highm on b=h order by b desc" and
> "big join high on b=h order by b desc") are given unreasonably small
> cost estimates by the unpatched code. (Note: the two sets of numbers
> vary a bit because they were working with nonidentical ANALYZE
> statistics.)

Looks good.

> Any objections to applying the patch?

None.

I do have a longer term issue that there is no information provided by
EXPLAIN to allow us to differentiate these two conditions. That makes it
harder to understand the basis of the plans and also gets everybody used
to seeing EXPLAINs that can't easily be explained, which leads to people
not reporting problems that exist. I doubt we can fix anything now, but
increased debugging/logging output is definitely required in some form.

> explain select * from big join highm on b=h order by b desc;
> QUERY PLAN
> ---------------------------------------------------------------------------------------
> Merge Join (cost=298.67..387.53 rows=1000 width=8)
> Merge Cond: (big.b = highm.h)
> -> Index Scan Backward using bigi on big (cost=0.00..3050.26 rows=100000 width=4)
> -> Index Scan Backward using highmi on highm (cost=0.00..43.25 rows=1000 width=4)
> (4 rows)

> explain select * from big join high on b=h order by b desc;
> QUERY PLAN
> ---------------------------------------------------------------------------------------
> Merge Join (cost=0.05..88.19 rows=1000 width=8)
> Merge Cond: (big.b = high.h)
> -> Index Scan Backward using bigi on big (cost=0.00..3050.26 rows=100000 width=4)
> -> Index Scan Backward using highi on high (cost=0.00..43.25 rows=1000 width=4)
> (4 rows)

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

In response to

Responses

Browse pgsql-patches by date

  From Date Subject
Next Message Gregory Stark 2007-12-07 09:15:54 Re: Proposed patch to make mergejoin cost estimation more symmetric
Previous Message Neil Conway 2007-12-07 08:16:08 Re: A minor typo fix on pg_standby docs