Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
Date: 2018-07-20 15:26:39
Message-ID: CA+TgmoZSaKq-fYALn5jf6c_X3==Rb2s8eqLDwGpV=LNNhTXYwg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jul 20, 2018 at 8:38 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> This looks like a terrible design to me. If somebody in future
> invents a new plan type that is not projection-capable, then this
> could fail an assertion here and there won't be any simple fix. And
> if you reach here and the child targetlists somehow haven't been
> adjusted, then you won't even fail an assertion, you'll just crash (or
> maybe return wrong answers).

To elaborate on this thought a bit, it is currently the case that all
scan and join types are projection-capable, and most likely we would
make any such node types added in the future projection-capable as
well, but it still doesn't seem like a good idea to me to have
tangentially-related parts of the system depend on that in subtle
ways. Also, even today, this would fail if the subplan happened to be
a Sort, and it's not very obvious why that couldn't happen. If it
can't happen today, it's not obvious why a future code change couldn't
introduce cases where it can happen.

More broadly, the central idea of this patch is that we can set the
target list for a child rel to something other than the target list
that we expect it will eventually produce, and then at plan creation
time fix it. I think that's a bad idea. The target list affects lots
of things, like costing. If we don't insert a ConvertRowTypeExpr into
the child's target list, the costing will be wrong to the extent that
ConvertRowTypeExpr has any cost, which it does. Any other current or
future property that is computed based on the target list --
parallel-safety, width, security-level, whatever -- could also
potentially end up with a wrong value if the target list as written
does not match the target list as will actually be produced. It's
true that, in all of these areas, the ConvertRowTypeExpr isn't likely
to have a lot of impact; it probably won't have a big effect on
costing and may not actually cause a problem for the other things that
I mentioned. On the other hand, if it does cause a serious problem,
what options would we have for fixing it other than to back out your
entire patch and solve the problem some other way? The idea that
derived properties won't be correct unless they are based on the real
target list is pretty fundamental, and I think deviating from the idea
that we get the correct target list first and then compute the derived
properties afterward is likely to cause real headaches for future
developers.

In short, plan creation time is just the wrong time to change the
plan. It's just a time to translate the plan from the form needed by
the planner to the form needed by the executor. It would be fine if
the change you were making were only cosmetic, but it's not.

After looking over both patches, I think Ashutosh Bapat has basically
the right idea. What he is proposing is a localized fix that doesn't
seem to make any changes to the way things work overall. I do think
his patches need to be fixed up a bit to avoid conflating
ConvertRowtypeExpr with "converted whole-row reference." The two are
certainly not equivalent; the latter is a narrow special case. Some
of the comments could use some more wordsmithing, too, I think. Apart
from those gripes, though, I think it's solving the problem the
correct way: don't build the wrong plan and try to fix it, just build
the right plan from the beginning.

There are definitely some things not to like about this approach. In
particular, I definitely agree that treating a converted whole-row
reference specially is not very nice. It feels like it might be
substantially cleaner to be able to represent this idea as a single
node rather than a combination of ConvertRowtypeExpr and var with
varattno = 0. Perhaps in the future we ought to consider either (a)
trying to use a RowExpr for this rather than ConvertRowtypeExpr or (b)
inventing a new WholeRowExpr node that stores two RTIs, one for the
relation that's generating the whole-row reference and the other for
the relation that is controlling the column ordering of the result or
(c) allowing a Var to represent a whole-row expression that has to
produce its outputs according to the ordering of some other RTE. But
I don't think it's wise or necessary to whack that around just to fix
this bug; it is refactoring or improvement work best left to a future
release.

Also, it is definitely a real shame that we have to build attr_needed
data for each child separately. Right now, we've got to build a whole
lot of things for each child individually, and that's slow and
inefficient. We're not going to scale nicely to large partitioning
hierarchies unless we can figure out some way of minimizing the work
we do for each partition. So, the fact that Fujii-san's approach
avoids needing to compute attr_needed in some cases is very appealing.
However, in my opinion, it's nowhere near appealing enough to justify
trying to do surgery on the target list at plan-creation time. I
think we need to leave optimizing this to a future effort.
Partition-wise join/aggregate, and partitioning in general, need a lot
of optimization in a lot of places, and fortunately there are people
working on that, but our goal here should just be to fix things up
well enough that we can ship it. I don't see anything in Ashutosh's
patch that is so ugly that we can't live with it for the time being.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2018-07-20 15:30:09 Re: [bug fix] Produce a crash dump before main() on Windows
Previous Message David Rowley 2018-07-20 15:17:00 Re: Internal error XX000 with enable_partition_pruning=on, pg 11 beta1 on Debian