Quick Links

Proposed patch to make mergejoin cost estimation more symmetric

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	pgsql-patches(at)postgreSQL(dot)org
Subject:	Proposed patch to make mergejoin cost estimation more symmetric
Date:	2007-12-07 00:19:27
Message-ID:	4810.1196986767@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-patches

There was some discussion earlier today
http://archives.postgresql.org/pgsql-performance/2007-12/msg00081.php
about how mergejoin cost estimation is not smart about the situation
where we will have to scan a lot of rows on one side of the join before
reaching the range of values found on the other side (and hence having
some chance of finding join pairs). Ideally, that effect should be
factored into the startup cost estimate for the join. This
consideration is the exact dual of one that we already do have code
for, namely that the merge can stop early if the range of values on
one side ends long before that of the other. So I looked into whether
that code couldn't be extended to handle this issue too. It turns out
that it can be, and it actually becomes more symmetrical because it's
considering both max and min values not just max.

Also, I found that there were a couple of new-in-8.3 bugs in that area
--- it's been extended to make estimates for descending-order
mergejoins, but it wasn't getting that quite right.

Since something needs to be done to that code anyway, I'm considering
applying the attached patch. It's a bit larger than I would normally
consider to be a good idea for an optional patch in late beta. But then
again I wouldn't have the slightest hesitation about back-patching a
change of this size after final release, so it seems attractive to put
it in now.

Aside from the patch, I have attached a test script that exercises merge
join planning for some simple cases, and the plans output by the script
in CVS HEAD with/without the patch. The cost estimates with the patch
are in line with expectation, the estimates without, not so much.
In particular, the existing bug can be seen at work here in that the
sixth and eighth test cases ("big join highm on b=h order by b desc" and
"big join high on b=h order by b desc") are given unreasonably small
cost estimates by the unpatched code. (Note: the two sets of numbers
vary a bit because they were working with nonidentical ANALYZE
statistics.)

Any objections to applying the patch?

regards, tom lane

Attachment	Content-Type	Size
unknown_filename	text/plain	32.3 KB
unknown_filename	text/plain	1.1 KB
unknown_filename	text/plain	3.8 KB
unknown_filename	text/plain	3.8 KB

Responses

Re: Proposed patch to make mergejoin cost estimation more symmetric at 2007-12-07 08:29:35 from Simon Riggs
Re: Proposed patch to make mergejoin cost estimation more symmetric at 2007-12-07 09:15:54 from Gregory Stark

Browse pgsql-patches by date

	From	Date	Subject
Next Message	FAST PostgreSQL	2007-12-07 07:48:58	A minor typo fix on pg_standby docs
Previous Message	Simon Riggs	2007-12-06 19:34:42	Re: Better default_statistics_target