Re: making update/delete of inheritance trees scale better

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: making update/delete of inheritance trees scale better
Date: 2021-02-04 09:32:50
Message-ID: CA+HiwqEChyg8cx74EsY-SMtGdYEoQZrzAwSDDpNF7S5e_JJP2w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 27, 2021 at 4:42 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, Oct 30, 2020 at 6:26 PM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> > Yeah, you need to access the old tuple to update its t_ctid, but
> > accessing it twice is still more expensive than accessing it once. Maybe
> > you could optimize it somewhat by keeping the buffer pinned or
> > something. Or push the responsibility down to the table AM, passing the
> > AM only the modified columns, and let the AM figure out how to deal with
> > the columns that were not modified, hoping that it can do something smart.
>
> Just as a point of possible interest, back when I was working on
> zheap, I sort of wanted to take this in the opposite direction. In
> effect, a zheap tuple has system columns that don't exist for a heap
> tuple, and you can't do an update or delete without knowing what the
> values for those columns are, so zheap had to just refetch the tuple,
> but that sucked in comparisons with the existing heap, which didn't
> have to do the refetch.

So would zheap refetch a tuple using the "ctid" column in the plan's
output tuple and then use some other columns from the fetched tuple to
actually do the update?

> At the time, I thought maybe the right idea
> would be to extend things so that a table AM could specify an
> arbitrary set of system columns that needed to be bubbled up to the
> point where the update or delete happens, but that seemed really
> complicated to implement and I never tried.

Currently, FDWs can specify tuple-identifying system columns, which
are added to the query's targetlist when rewriteTargetListUD() calls
the AddForeignUpdateTargets() API.

In rewriteTargetListUD(), one can see that the planner assumes that
all local tables, irrespective of their AM, use a "ctid" column to
identify their tuples:

if (target_relation->rd_rel->relkind == RELKIND_RELATION ||
target_relation->rd_rel->relkind == RELKIND_MATVIEW ||
target_relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
{
/*
* Emit CTID so that executor can find the row to update or delete.
*/
var = makeVar(parsetree->resultRelation,
SelfItemPointerAttributeNumber,
TIDOID,
-1,
InvalidOid,
0);

attrname = "ctid";
}
else if (target_relation->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
{
/*
* Let the foreign table's FDW add whatever junk TLEs it wants.
*/
FdwRoutine *fdwroutine;

fdwroutine = GetFdwRoutineForRelation(target_relation, false);

if (fdwroutine->AddForeignUpdateTargets != NULL)
fdwroutine->AddForeignUpdateTargets(parsetree, target_rte,
target_relation);

Maybe the first block could likewise ask the table AM if it prefers to
add a custom set of system columns or just add "ctid" otherwise?

> Here it seems like we're
> thinking of going the other way, and just always doing the refetch.

To be clear, the new refetch in ExecModifyTable() is to fill in the
unchanged columns in the new tuple. If we rejigger the
table_tuple_update() API to receive a partial tuple (essentially
what's in 'planSlot' passed to ExecUpdate) as opposed to the full
tuple, we wouldn't need the refetch.

We'd need to teach a few other executor routines, such as
ExecWithCheckOptions(), ExecConstraints(), etc. to live with a partial
tuple but maybe that's doable with some effort. We could even
optimize away evaluating any constraints if none of the constrained
columns are unchanged.

--
Amit Langote
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-02-04 09:32:52 Re: Single transaction in the tablesync worker?
Previous Message Wang, Shenhao 2021-02-04 09:25:00 parse mistake in ecpg connect string