Re: making update/delete of inheritance trees scale better

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: making update/delete of inheritance trees scale better
Date: 2020-06-02 04:15:24
Message-ID: CA+HiwqH-2sq-3Zq-CtuWjfRSyrGPXJBf1nCKKvTHuGVyfQ1OYA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

So, I think I have a patch that seems to work, but not all the way,
more on which below.

Here is the commit message in the attached patch.

===
Subject: [PATCH] Overhaul UPDATE's targetlist processing

Instead of emitting the full tuple matching the target table's tuple
descriptor, make the plan emit only the attributes that are assigned
values in the SET clause, plus row-identity junk attributes as before.
This allows us to avoid making a separate plan for each target
relation in the inheritance case, because the only reason it is so
currently is to account for the fact that each target relations may
have a set of attributes that is different from others. Having only
one plan suffices, because the set of assigned attributes must be same
in all the result relations.

While the plan will now produce only the assigned attributes and
row-identity junk attributes, other columns' values are filled by
refetching the old tuple. To that end, there will be a targetlist for
each target relation to compute the full tuple, that is, by combining
the values from the plan tuple and the old tuple, but they are passed
separately in the ModifyTable node.

Implementation notes:

* In the inheritance case, as the same plan produces tuples to be
updated from multiple result relations, the tuples now need to also
identity which table they come from, so an additional junk attribute
"tableoid" is present in that case.

* Considering that the inheritance set may contain foreign tables that
require a different (set of) row-identity junk attribute(s), the plan
needs to emit multiple distinct junk attributes. When transposed to a
child scan node, this targetlist emits a non-NULL value for the junk
attribute that's valid for the child relation and NULL for others.

* Executor and FDW execution APIs can no longer assume any specific
order in which the result relations will be processed. For each
tuple to be updated/deleted, result relation is selected by looking it
up in a hash table using the "tableoid" value as the key.

* Since the plan does not emit values for all the attributes, FDW APIs
may not assume that the individual column values in the TupleTableSlot
containing the plan tuple are accessible by their attribute numbers.

TODO:

* Reconsider having only one plan!
* Update FDW handler docs to reflect the API changes
===

Regarding the first TODO, it is to address the limitation that FDWs
will no longer be able push the *whole* child UPDATE/DELETE query down
to the remote server, including any joins, which is allowed at the
moment via PlanDirectModify API. The API seems to have been designed
with an assumption that the child scan/join node is the top-level
plan, but that's no longer the case. If we consider bypassing the
Append and allow ModifyTable to access the child scan/join nodes
directly, maybe we can allow that. I haven't updated the expected
output of postgres_fdw regression tests for now pending this.

A couple of things in the patch that I feel slightly uneasy about:

* Result relations are now appendrel children in the planner.
Normally, any wholerow Vars in the child relation's reltarget->exprs
get a ConvertRowType added on top to convert it back to the parent's
reltype, because that's what the client expects in the SELECT case.
In the result relation case, the executor expects to see child
wholerow Vars themselves, not their parent versions.

* FDW's ExecFoeignUpdate() API expects that the NEW tuple passed to it
match the target foreign table reltype, so that it can access the
target attributes in the tuple by attribute numbers. Considering that
the plan no longer builds the full tuple itself, I made the executor
satisfy that expectation by filling the missing attributes' values
using the target table's wholerow attribute. That is, we now *always*
fetch the wholerow attributes for UPDATE, not just when there are
row-level triggers that need it. I think that's unfortunate. Maybe,
the correct way is asking the FDWs to translate (setrefs.c style) the
target attribute numbers appropriately to access the plan's output
tuple.

I will add the patch to the next CF. I haven't yet fully checked the
performance considerations of the new approach, but will do so in the
coming days.

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
v1-0001-Overhaul-UPDATE-s-targetlist-processing.patch application/octet-stream 185.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2020-06-02 04:24:56 Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical () at walsender.c:2762
Previous Message Masahiko Sawada 2020-06-02 03:40:11 Small doc improvement about spilled txn tracking