Proposed patch to make mergejoin cost estimation more symmetric

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-patches(at)postgreSQL(dot)org
Subject: Proposed patch to make mergejoin cost estimation more symmetric
Date: 2007-12-07 00:19:27
Message-ID: 4810.1196986767@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-patches

There was some discussion earlier today
http://archives.postgresql.org/pgsql-performance/2007-12/msg00081.php
about how mergejoin cost estimation is not smart about the situation
where we will have to scan a lot of rows on one side of the join before
reaching the range of values found on the other side (and hence having
some chance of finding join pairs). Ideally, that effect should be
factored into the startup cost estimate for the join. This
consideration is the exact dual of one that we already do have code
for, namely that the merge can stop early if the range of values on
one side ends long before that of the other. So I looked into whether
that code couldn't be extended to handle this issue too. It turns out
that it can be, and it actually becomes more symmetrical because it's
considering both max and min values not just max.

Also, I found that there were a couple of new-in-8.3 bugs in that area
--- it's been extended to make estimates for descending-order
mergejoins, but it wasn't getting that quite right.

Since something needs to be done to that code anyway, I'm considering
applying the attached patch. It's a bit larger than I would normally
consider to be a good idea for an optional patch in late beta. But then
again I wouldn't have the slightest hesitation about back-patching a
change of this size after final release, so it seems attractive to put
it in now.

Aside from the patch, I have attached a test script that exercises merge
join planning for some simple cases, and the plans output by the script
in CVS HEAD with/without the patch. The cost estimates with the patch
are in line with expectation, the estimates without, not so much.
In particular, the existing bug can be seen at work here in that the
sixth and eighth test cases ("big join highm on b=h order by b desc" and
"big join high on b=h order by b desc") are given unreasonably small
cost estimates by the unpatched code. (Note: the two sets of numbers
vary a bit because they were working with nonidentical ANALYZE
statistics.)

Any objections to applying the patch?

regards, tom lane

Attachment Content-Type Size
unknown_filename text/plain 32.3 KB
unknown_filename text/plain 1.1 KB
unknown_filename text/plain 3.8 KB
unknown_filename text/plain 3.8 KB

Responses

Browse pgsql-patches by date

  From Date Subject
Next Message FAST PostgreSQL 2007-12-07 07:48:58 A minor typo fix on pg_standby docs
Previous Message Simon Riggs 2007-12-06 19:34:42 Re: Better default_statistics_target