Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, Andres Freund <andres(at)anarazel(dot)de>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
Date: 2018-08-03 13:44:52
Message-ID: CA+TgmoYJ_HmMjGr-y2fg71XQ6J3S9ro6oehJmYkkoGVwZSwe7w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 2, 2018 at 4:25 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I do not trust Ashutosh's patch because of the point you noted that it
> will kick in on ConvertRowtypeExprs that are not related to partitioning.

I don't remember making that point -- can you clarify?

> It's also got a rather fundamental conceptual (or at least documentation)
> error in that it says it's making pull_var_clause do certain things to
> all ConvertRowtypeExprs, but that's not what the code actually does.
> I think the tension around that has a lot to do with the fact that the
> patch went through so many revisions, and I don't have any faith that
> it's right even yet. As you mentioned upthread, this might be insoluble
> without some representational change for converted whole-row Vars.

Insoluble is a strong word....

> What I'm thinking might be a more appropriate thing, at least for
> getting v11 out the door, is to refuse to generate partitionwise
> joins when any whole-row vars are involved. (Perhaps that's not
> enough to dodge all the problems, though?)

It's enough to dodge the problem being discussed on this thread.

> Now, that's a bit of a problem for postgres_fdw, because it seems to
> insist on injecting WRVs even when the query text does not require any.
> Why is that, and can't we get rid of it? It's horrid for performance
> even without any of these other considerations. But if we fail to get
> rid of that in time for v11, it would mean that postgres_fdw can't
> participate in PWJ, which isn't great but I think we could live with it
> for now.

I don't quite know what you mean here -- postgres_fdw does use
whole-row vars for EPQ handling, which may be what you're thinking
about.

> BTW, what exactly do we think the production status of PWJ is, anyway?
> I notice that upthread it was stated that enable_partitionwise_join
> defaults to off, as indeed it still does, and we'd turn it on later
> when we'd gotten rid of some memory-hogging problems. If that hasn't
> happened yet (and I don't see any open item about considering enabling
> PWJ by default for v11), then I have exactly no hesitation about
> lobotomizing PWJ as hard as we need to to mask this problem for v11.
> I'd certainly prefer a solution along those lines to either of these
> patches.

Yeah, that hasn't been done yet. Partition-wise join probably needs a
good bit of work in a number of areas to do all the things that people
will want it to do -- see the thread on advanced partition-matching,
which is an improvement but still doesn't cover every case that could
come up. In terms of memory usage, I think that we need some
discussion of the approach that should be taken. I see a couple of
possible things we could do, not necessarily mutually exclusive.

1. We could try to avoid translating all of the parent's data
structures for every child RelOptInfo, either by rejiggering things so
that the child doesn't need all of that data, or by making it able to
use the parent's copy of the data.

2. We could try to recycle memory more quickly. For example, if we're
planning a partition-wise join of A-B-C-D, first plan paths for
A1-B1-C1-D1 (and all proper subsets of those rels), then throw away
most of the memory, then plan paths for A2-B2-C2-D2, etc.

3. We could generate paths for on "representative" children (e.g. the
biggest ones) and use that to estimate the cost of the partition-wise
plan. If that plan is selected, then go back and figure out real
paths for the other children.

All of these things help in different ways, and Ashutosh had code for
some version of all of them at various points, but the problem is
quite difficult. If you have three tables with 1000 partitions each,
then without partition-wise join you need 3000 (partitions) + 3
(baserels) + 3 (level-2 joinrels) + 1 (level-3 joinrel) RelOptInfos.
If you do a partition-wise join, then you get 3000 level-2
child-joinrels and 1000 level-3 child joinrels. Those RelOptInfo
structures, and the paths attached to them, cannot help but take up
memory. Perhaps that's OK, and we ought to just say "well, if you
want better plans, you're going to have to allow more memory for
planning". If not, we have to decide how hard we want to work in
which areas and how good the result needs to be in order to have this
turned on by difficult.

Honestly, I'm pretty impressed that we have added not one but two
members to the RelOptKind enum without as little collateral damage as
there has been. This issue is annoying and the discussion has gone on
longer than anyone would probably prefer, but it's really pretty minor
in the grand scheme of things, at least IMHO.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-08-03 14:07:01 Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.
Previous Message Simon Riggs 2018-08-03 13:41:05 Re: Standby trying "restore_command" before local WAL