Re: making update/delete of inheritance trees scale better

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Langote <amitlangote09(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: making update/delete of inheritance trees scale better
Date: 2021-03-23 16:46:40
Message-ID: CA+TgmoZ98JHuMPvT7NNOz9-CZP9VMY_sR-fk1MGJGE_rM7yuWg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 3, 2021 at 9:39 AM Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
> Just noticed that a test added by the recent 927f453a941 fails due to
> 0002. We no longer allow moving a row into a postgres_fdw partition
> if it is among the UPDATE's result relations, whereas previously we
> would if the UPDATE on that partition is already finished.
>
> To fix, I've adjusted the test case. Attached updated version.

I spent some time studying this patch this morning. As far as I can
see, 0001 is a relatively faithful implementation of the design Tom
proposed back in early 2019. I think it would be nice to either get
this committed or else decide that we don't want it and what we're
going to try to do instead, because we can't make UPDATEs and DELETEs
stop sucking with partitioned tables until we settle on some solution
to the problem that is inheritance_planner(), and that strikes me as
an *extremely* important problem. Lots of people are trying to use
partitioning in PostgreSQL, and they don't like finding out that, in
many cases, it makes things slower rather than faster. Neither this
nor any other patch is going to solve that problem in general, because
there are *lots* of things that haven't been well-optimized for
partitioning yet. But, this is a pretty important case that we should
really try to do something about.

So, that said, here are some random comments:

- I think it would be interesting to repeat your earlier benchmarks
using -Mprepared. One question I have is whether we're saving overhead
here during planning at the price of execution-time overhead, or
whether we're saving during both planning and execution.

- Until I went back and found your earlier remarks on this thread, I
was confused as to why you were replacing a JunkFilter with a
ProjectionInfo. I think it would be good to try to add some more
explicit comments about that design choice, perhaps as header comments
for ExecGetUpdateNewTuple, or maybe there's a better place. I'm still
not sure why we need to do the same thing for the insert case, which
seems to just be about removing junk columns. At least in the non-JIT
case, it seems to me that ExecJunkFilter() should be cheaper than
ExecProject(). Is it different enough to matter? Does the fact that we
can JIT the ExecProject() work make it actually faster? These are
things I don't know.

- There's a comment which you didn't write but just moved which I find
to be quite confusing. It says "For UPDATE/DELETE, find the
appropriate junk attr now. Typically, this will be a 'ctid' or
'wholerow' attribute, but in the case of a foreign data wrapper it
might be a set of junk attributes sufficient to identify the remote
row." But, the block of code which follows caters only to the 'ctid'
and 'wholerow' cases, not anything else. Perhaps that's explained by
the comment a bit further down which says "When there is a row-level
trigger, there should be a wholerow attribute," but if the point is
that this code is only reached when there's a row-level trigger,
that's far from obvious. It *looks* like something we'd reach for ANY
insert or UPDATE case. Maybe it's not your job to do anything about
this confusion, but I thought it was worth highlighting.

- The comment for filter_junk_tlist_entries(), needless to say, is of
the highest quality, but would it make any sense to think about having
an option for expand_tlist() to omit the junk entries itself, to avoid
extra work? I'm unclear whether we want that behavior in all UPDATE
cases or only some of them, because preproces_targetlist() has a call
to expand_tlist() to set parse->onConflict->onConflictSet that does
not call filter_junk_tlist_entries() on the result. Does this patch
need to make any changes to the handling of ON CONFLICT .. UPDATE? It
looks to me like right now it doesn't, but I don't know whether that's
an oversight or intentional.

- The output changes in the main regression test suite are pretty easy
to understand: we're just seeing columns that no longer need to get
fed through the execution get dropped. The changes in the postgres_fdw
results are harder to understand. In general, it appears that what's
happening is that we're no longer outputting the non-updated columns
individually -- which makes sense -- but now we're outputting a
whole-row var that wasn't there before, e.g.:

- Output: foreign_tbl.a, (foreign_tbl.b + 15), foreign_tbl.ctid
+ Output: (foreign_tbl.b + 15), foreign_tbl.ctid, foreign_tbl.*

Since this is postgres_fdw, we can re-fetch the row using CTID, so
it's not clear to me why we need foreign_tbl.* when we didn't before.
Maybe the comments need improvement.

- Specifically, I think the comments in preptlist.c need some work.
You've edited the top-of-file comment, but I don't think it's really
accurate. The first sentence says "For INSERT and UPDATE, the
targetlist must contain an entry for each attribute of the target
relation in the correct order," but I don't think that's really true
any more. It's certainly not what you see in the EXPLAIN output. The
paragraph goes on to explain that UPDATE has a second target list, but
(a) that seems to contradict the first sentence and (b) that second
target list isn't what you see when you run EXPLAIN. Also, there's no
mention of what happens for FDWs here, but it's evidently different,
as per the previous review comment.

- The comments atop fix_join_expr() should be updated. Maybe you can
just adjust the wording for case #2.

OK, that's all I've got based on a first read-through. Thanks for your
work on this.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message James Coleman 2021-03-23 16:48:32 Re: Nicer error when connecting to standby with hot_standby=off
Previous Message Tom Lane 2021-03-23 16:34:01 Re: Nicer error when connecting to standby with hot_standby=off