Re: WIP patch for parameterized inner paths

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: WIP patch for parameterized inner paths
Date: 2012-01-25 16:24:13
Message-ID: 29697.1327508653@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> Attached is a WIP patch for parameterized paths, along the
> lines we have discussed before: ...

I've made considerable progress on the TODO items I listed: indxpath.c
has been ripped apart and restructured, and I have it considering
parameterized paths for hash sub-joins. I made a deliberate policy
decision not to work very hard on exploring parameterized mergejoin
plans, because that seems to inflate the number of paths considered way
more than the relative usefulness of merge over hash joins justifies.
Also, the attached patch undoes the lobotomization of IN/EXISTS pullup
that we had to install in 8.4 as a stopgap for lack of ability to
consider the type of plan that can now be handled.

After that I started doing some performance testing, and the initial
news was bad: the planner was about 3x slower than 9.1 on a moderately
large join problem. I've spent the past several days hacking away at
that, and have got it down to about 35% slower by dint of the following
changes:

* micro-optimization of add_path(), in particular avoiding duplicate
calculations during cost comparisons.

* introducing a two-step mechanism for testing whether proposed join
paths are interesting. The patch first calculates a quick and dirty
lower bound on the cost of the join path (mostly, from the costs of
its input paths) and looks through the joinrel's pathlist to see if
there is already a path that is clearly better. If not, it proceeds
with the full cost calculation and add_path insertion. In my testing
the precheck is able to eliminate 80% or so of the proposed paths,
more than repaying the extra pathlist search.

Now this is of course cheating with both hands, in that either of these
optimization techniques could have been applied before ... but they
weren't. I think that 35% slower on large join problems is probably
acceptable, given that this is investigating a larger number of possible
solutions and hopefully often finding a better plan. I think I can
knock some more off of that by refactoring the costsize.c APIs, anyway.
Right now the first-pass and second-pass cost calculations are
independent and so there's some repeated work, which I'd like to
eliminate both for speed reasons and to get rid of duplicate code
that'd have to be kept in sync if it's left as-is.

If there are not objections, I'd like to go ahead and commit this
after one more round of polishing. There's still a lot left to do,
but what it mainly needs now is some testing on real-world planning
problems, and I don't think it's going to get that until it's been
merged in.

regards, tom lane

Attachment Content-Type Size
parameterized-paths-2.patch.gz application/octet-stream 74.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2012-01-25 16:25:46 Re: pg_trigger_depth() v3 (was: TG_DEPTH)
Previous Message Adrian Klaver 2012-01-25 16:22:26 Re: Why extract( ... from timestamp ) is not immutable?