Re: Planner producing 100% duplicate subplans when unneeded

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Daniel Grace <dgrace(at)wingsnw(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: Planner producing 100% duplicate subplans when unneeded
Date: 2010-10-05 00:55:34
Message-ID: 11094.1286240134@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Mon, Sep 27, 2010 at 5:09 PM, Daniel Grace <dgrace(at)wingsnw(dot)com> wrote:
>> Is there any chance this might be looked at in a future release?

> This is another interesting example of a case where an inlining-type
> optimization (which is effectively what's happening here, I think)
> turns out to be a negative. We had one a while back that involved
> actual function inlining, which is not quite what's happening here,
> but it's close. It doesn't seem too hard to figure out whether or not
> inlining is a win (non-trivial subexpressions should probably never
> get duplicated), but nobody's gotten around to writing the logic to
> make it work yet.

Actually this case has some history behind it. The reason that the
sub-sub-query gets duplicated is that we flatten the sub-query, resulting
in duplicating its output expressions wherever they're referenced.
Now before you say that that's stupid, please notice that this case
got disabled in 7.3 as a bug workaround, and that almost immediately
produced howls of pain:
http://archives.postgresql.org/pgsql-general/2002-12/msg00410.php
http://archives.postgresql.org/pgsql-performance/2002-12/msg00214.php
so I undid it as soon as I could:
http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=de97072e3c88e104a55b0d5c67477f1b0097c003

While just shutting off the pull-up unconditionally would merely require
reintroducing a contains_subplans() restriction in is_simple_subquery(),
I'm unwilling to do that because of the earlier complaints. And
"figuring out whether it's a win" is a lot harder than you opine above:
at the point where this decision has to be taken, we have no cost
information whatsoever, and not even any clear idea whether the subquery
outputs are referenced at all let alone multiply referenced.

My thought about the current shortest path to a solution would be to
wrap subquery-containing output expressions in PlaceHolderVars.
Per this comment elsewhere in is_simple_subquery(),

/*
* Don't pull up a subquery that has any volatile functions in its
* targetlist. Otherwise we might introduce multiple evaluations of these
* functions, if they get copied to multiple places in the upper query,
* leading to surprising results. (Note: the PlaceHolderVar mechanism
* doesn't quite guarantee single evaluation; else we could pull up anyway
* and just wrap such items in PlaceHolderVars ...)
*/

that doesn't entirely work at the moment, but it might work if we were
to rejigger the PHV mechanism a bit. This would be attractive because
the added overhead of a PHV would be nearly nil, so we could afford to
do this without making any predictions about relative costs.

As a workaround until somebody gets around to looking at that, I'd
suggest that Daniel plaster his sub-query with the good old all-purpose
optimization fence, "OFFSET 0". That will prevent flattening and thus
prevent the sub-sub-query from getting duplicated.

regards, tom lane

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Craig Ringer 2010-10-05 01:26:16 Re: Postgres 9.0 crash on win7
Previous Message Robert Haas 2010-10-04 23:33:46 Re: BUG #5690: pg_upgrade fails