Re: WIP patch for parameterized inner paths

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP patch for parameterized inner paths
Date: 2012-01-25 21:57:08
Message-ID: CA+TgmoawC6FkVYUs7pL_FbKOjxGgL8yqhXEn3Lx0h_QqNxJzeQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 25, 2012 at 1:24 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Also, you're assuming that the changes have no upside whatsoever, which
> I fondly hope is not the case.  Large join problems tend not to execute
> instantaneously --- so nobody is going to complain if the planner takes
> awhile longer but the resulting plan is enough better to buy that back.
> In my test cases, the planner *is* finding better plans, or at least
> ones with noticeably lower estimated costs.  It's hard to gauge how
> much that translates to in real-world savings, since I don't have
> real data loaded up.  I also think, though I've not tried to measure,
> that I've made planning cheaper for very simple queries by eliminating
> some overhead in those cases.

I had a 34-table join on one of the last applications I maintained
that planned and executed in less than 2 seconds. That was pushing
it, but I had many joins in the 10-20 table range that planned and
executed in 100-200 ms. I agree that if you are dealing with a
terabyte table - or even a gigabyte table - then the growth of
planning time will probably not bother anyone even if you fail to find
a better plan, and will certainly make the user very happy if you do.
But on tables with only a megabyte of data, it's not nearly so
clear-cut. In an ideal world, I'd like the amount of effort we spend
planning to be somehow tied to the savings we can expect to get, and
deploy optimizations like this only in cases where we have a
reasonable expectation of that effort being repaid.

AIUI, this is mostly going to benefit cases like small LJ (big1 IJ
big2) and, of course, those cases aren't going to arise if your query
only involves small tables, or even if you have something like big IJ
small1 IJ small2 IJ small3 IJ small4 LJ small5 LJ small6 IJ small7,
which is a reasonably common pattern for me. Now, if you come back
and say, ah, well, those cases aren't the ones that are going to be
harmed by this, then maybe we should have a more detailed conversation
about where the mines are. Or maybe it is helping in more cases than
I'm thinking about at the moment.

>> To be clear, I'd love to have this feature.  But if there is a choice
>> between reducing planning time significantly for everyone and NOT
>> getting this feature, and increasing planning time significantly for
>> everyone and getting this feature, I think we will make more people
>> happy by doing the first one.
>
> We're not really talking about "are we going to accept or reject a
> specific feature".  We're talking about whether we're going to decide
> that the last two years worth of planner development were headed in
> the wrong direction and we're now going to reject that and try to
> think of some entirely new concept.  This isn't an isolated patch,
> it's the necessary next step in a multi-year development plan.  The
> fact that it's a bit slower at the moment just means there's still
> work to do.

I'm not proposing that you should never commit this. I'm proposing
that any commit by anyone that introduces a 35% performance regression
is unwise, and doubly so at the end of the release cycle. I have
every confidence that you can improve the code further over time, but
the middle of the last CommitFest is not a great time to commit code
that, by your own admission, needs a considerable amount of additional
work. Sure, there are some things that we're not going to find out
until the code goes into production, but it seems to me that you've
already uncovered a fairly major performance problem that is only
partially fixed. Once this is committed, it's not coming back out, so
we're either going to have to figure out how to fix it before we
release, or release with a regression in certain cases. If you got it
down to 10% I don't think I'd be worried, but a 35% regression that we
don't know how to fix seems like a lot.

On another note, nobody besides you has looked at the code yet, AFAIK...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2012-01-25 22:05:44 Re: Second thoughts on CheckIndexCompatible() vs. operator families
Previous Message Alvaro Herrera 2012-01-25 21:25:46 Re: psql COPY vs. ON_ERROR_ROLLBACK, multi-command strings