Skip site navigation (1) Skip section navigation (2)

Proposed patch to make mergejoin cost estimation more symmetric

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-patches(at)postgreSQL(dot)org
Subject: Proposed patch to make mergejoin cost estimation more symmetric
Date: 2007-12-07 00:19:27
Message-ID: 4810.1196986767@sss.pgh.pa.us (view raw or flat)
Thread:
Lists: pgsql-patches
There was some discussion earlier today
http://archives.postgresql.org/pgsql-performance/2007-12/msg00081.php
about how mergejoin cost estimation is not smart about the situation
where we will have to scan a lot of rows on one side of the join before
reaching the range of values found on the other side (and hence having
some chance of finding join pairs).  Ideally, that effect should be
factored into the startup cost estimate for the join.  This
consideration is the exact dual of one that we already do have code
for, namely that the merge can stop early if the range of values on
one side ends long before that of the other.  So I looked into whether
that code couldn't be extended to handle this issue too.  It turns out
that it can be, and it actually becomes more symmetrical because it's
considering both max and min values not just max.

Also, I found that there were a couple of new-in-8.3 bugs in that area
--- it's been extended to make estimates for descending-order
mergejoins, but it wasn't getting that quite right.

Since something needs to be done to that code anyway, I'm considering
applying the attached patch.  It's a bit larger than I would normally
consider to be a good idea for an optional patch in late beta.  But then
again I wouldn't have the slightest hesitation about back-patching a
change of this size after final release, so it seems attractive to put
it in now.

Aside from the patch, I have attached a test script that exercises merge
join planning for some simple cases, and the plans output by the script
in CVS HEAD with/without the patch.  The cost estimates with the patch
are in line with expectation, the estimates without, not so much.
In particular, the existing bug can be seen at work here in that the
sixth and eighth test cases ("big join highm on b=h order by b desc" and
"big join high on b=h order by b desc") are given unreasonably small
cost estimates by the unpatched code.  (Note: the two sets of numbers
vary a bit because they were working with nonidentical ANALYZE
statistics.)

Any objections to applying the patch?

			regards, tom lane

Responses

pgsql-patches by date

Next:From: FAST PostgreSQLDate: 2007-12-07 07:48:58
Subject: A minor typo fix on pg_standby docs
Previous:From: Simon RiggsDate: 2007-12-06 19:34:42
Subject: Re: Better default_statistics_target

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group