Re: ModifyTable overheads in generic plans

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, David Rowley <dgrowleyml(at)gmail(dot)com>, "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>
Subject: Re: ModifyTable overheads in generic plans
Date: 2021-04-04 14:34:32
Message-ID: CA+HiwqH6NvE2yQQvTW5GaWkD5dkzdqdLOVBvYHP7yARntdMheQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Apr 4, 2021 at 10:20 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Amit Langote <amitlangote09(at)gmail(dot)com> writes:
> > On Thu, Apr 1, 2021 at 3:12 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> Amit Langote <amitlangote09(at)gmail(dot)com> writes:
> > [ v14-0002-Initialize-result-relation-information-lazily.patch ]
> >> Needs YA rebase over 86dc90056.
>
> > Done.
>
> I spent some time looking this over.

Thanks.

> There are bits of it we can
> adopt without too much trouble, but I'm afraid that 0001 (delay
> FDW BeginDirectModify until the first actual update) is a nonstarter,
> which makes the main idea of delaying ExecInitResultRelation unworkable.
>
> My fear about 0001 is that it will destroy any hope of direct updates
> on different remote partitions executing with consistent semantics
> (i.e. compatible snapshots), because some row updates triggered by the
> local query may have already happened before a given partition gets to
> start its remote query. Maybe we can work around that, but I do not
> want to commit a major restructuring that assumes we can dodge this
> problem when we don't yet even have a fix for cross-partition updates
> that does rely on the assumption of synchronous startup.

Hmm, okay, I can understand the concern.

> In some desultory performance testing here, it seemed like a
> significant part of the cost is ExecOpenIndices, and I don't see
> a reason offhand why we could not delay/skip that. I also concur
> with delaying construction of ri_ChildToRootMap and the
> partition_tuple_routing data structures, since many queries will
> never need those at all.

As I mentioned in [1], creating ri_projectNew can be expensive too,
especially as column count (and partition count for the generic plan
case) grows. I think we should have an static inline
initialize-on-first-access accessor function for that field too.

Actually, I remember considering having such accessor functions (all
static inline) for ri_WithCheckOptionExprs, ri_projectReturning,
ri_onConflictArbiterIndexes, and ri_onConflict (needed by ON CONFLICT
UPDATE) as well, prompted by Heikki's comments earlier in the
discussion. I also remember, before even writing this patch, not
liking that WCO and RETURNING expressions are initialized in their own
separate loops, rather than being part of the earlier loop that says:

/*
* Do additional per-result-relation initialization.
*/
for (i = 0; i < nrels; i++)
{

I guess ri_RowIdAttNo initialization can go into the same loop.

> > * PartitionTupleRouting.subplan_resultrel_htab is removed in favor
> > of using ModifyTableState.mt_resultOidHash to look up an UPDATE
> > result relation by OID.
>
> Hmm, that sounds promising too, though I didn't look at the details.
>
> Anyway, I think the way to proceed for now is to grab the low-hanging
> fruit of things that clearly won't change any semantics. But tail end
> of the dev cycle is no time to be making really fundamental changes
> in how FDW direct modify works.

I agree.

--
Amit Langote
EDB: http://www.enterprisedb.com

[1] https://www.postgresql.org/message-id/CA+HiwqHLUNhMxy46Mrb04VJpN=HUdm9bD7xdZ6f5h2o4imX79g@mail.gmail.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2021-04-04 14:56:36 Re: TRUNCATE on foreign table
Previous Message Julien Rouhaud 2021-04-04 14:18:50 Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?