Re: making update/delete of inheritance trees scale better

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Amit Langote <amitlangote09(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: making update/delete of inheritance trees scale better
Date: 2021-03-27 16:30:07
Message-ID: 2009402.1616862607@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Amit Langote <amitlangote09(at)gmail(dot)com> writes:
> Attached updated version of the patch. I have forgotten to mention in
> my recent posts on this thread one thing about 0001 that I had
> mentioned upthread back in June. That it currently fails a test in
> postgres_fdw's suite due to a bug of cross-partition updates that I
> decided at the time to pursue in another thread:
> https://www.postgresql.org/message-id/CA%2BHiwqE_UK1jTSNrjb8mpTdivzd3dum6mK--xqKq0Y9VmfwWQA%40mail.gmail.com

Yeah, I ran into that too. I think we need not try to fix it in HEAD;
we aren't likely to commit 0001 and 0002 separately. We need some fix
for the back branches, but that would better be discussed in the other
thread. (Note that the version of 0001 I attach below shows the actual
output of the postgres_fdw test, including a failure from said bug.)

I wanted to give a data dump of where I am. I've reviewed and
nontrivially modified 0001 and the executor parts of 0002, and
I'm fairly happy with the state of that much of the code now.
(Note that 0002 below contains some cosmetic fixes, such as comments,
that logically belong in 0001, but I didn't bother to tidy that up
since I'm not seeing these as separate commits anyway.)

The planner, however, still needs a lot of work. There's a serious
functional problem, in that UPDATEs across partition trees having
more than one foreign table fail with

ERROR: junk column "wholerow" of child relation 5 conflicts with parent junk column with same name

(cf. multiupdate.sql test case attached). I think we could get around
that by requiring "wholerow" junk attrs to have vartype RECORDOID instead
of the particular table's rowtype, which might also remove the need for
some of the vartype translation hacking in 0002. But I haven't tried yet.

More abstractly, I really dislike the "fake variable" design, primarily
the aspect that you made the fake variables look like real columns of
the parent table with attnums just beyond the last real one. I think
this is just a recipe for obscuring bugs, since it means you have to
lobotomize a lot of bad-attnum error checks. The alternative I'm
considering is to invent a separate RTE that holds all the junk columns.
Haven't tried that yet either.

The situation in postgres_fdw is not great either, specifically this
change:

@@ -2054,8 +2055,7 @@ postgresBeginForeignInsert(ModifyTableState *mtstate,
*/
if (plan && plan->operation == CMD_UPDATE &&
(resultRelInfo->ri_usesFdwDirectModify ||
- resultRelInfo->ri_FdwState) &&
- resultRelInfo > mtstate->resultRelInfo + mtstate->mt_whichplan)
+ resultRelInfo->ri_FdwState))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot route tuples into foreign table to be updated \"%s\"",

which is what forced you to remove or lobotomize several regression
test cases. Now admittedly, that just moves the state of play for
cross-partition updates into postgres_fdw partitions from "works
sometimes" to "works never". But I don't like the idea that we'll
be taking away actual functionality.

I have a blue-sky idea for fixing that properly, which is to disable FDW
direct updates when there is a possibility of a cross-partition update,
instead doing it the old way with a remote cursor reading the source rows
for later UPDATEs. (If anyone complains that this is too slow, my answer
is "it can be arbitrarily fast when it doesn't have to give the right
answer". Failing on cross-partition updates isn't acceptable.) The point
is that once we have issued DECLARE CURSOR, the cursor's view of the
source data is static so it doesn't matter if we insert new rows into the
remote table. The hard part of that is to make sure that the DECLARE
CURSOR gets issued before any updates from other partitions can arrive,
which I think means we'd need to issue it during plan tree startup not at
first fetch from the ForeignScan node. Maybe that happens already, or
maybe we'd need a new/repurposed FDW API call. I've not researched it.

Anyway, I'd really like to get this done for v14, so I'm going to buckle
down and try to fix the core-planner issues I mentioned. It'd be nice
if somebody could look at fixing the postgres_fdw problem in parallel
with that.

regards, tom lane

Attachment Content-Type Size
v12-0001-Overhaul-how-updates-compute-new-tuple.patch text/x-diff 85.7 KB
v12-0002-Revise-how-inherited-update-delete-are-handled.patch text/x-diff 218.7 KB
multiupdate.sql text/plain 1.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2021-03-27 17:06:11 Re: truncating timestamps on arbitrary intervals
Previous Message John Naylor 2021-03-27 15:26:47 Re: non-HOT update not looking at FSM for large tuple update