Re: ModifyTable overheads in generic plans

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, David Rowley <dgrowleyml(at)gmail(dot)com>, "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>
Subject: Re: ModifyTable overheads in generic plans
Date: 2021-04-07 08:18:21
Message-ID: CA+HiwqEcawatEaUh1uTbZMEZTJeLzbroRTz9_X9Z5CFjTWJkhw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 7, 2021 at 8:24 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Amit Langote <amitlangote09(at)gmail(dot)com> writes:
> > On Mon, Apr 5, 2021 at 1:43 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> OK. Do you want to pull out the bits of the patch that we can still
> >> do without postponing BeginDirectModify?
>
> > I ended up with the attached, whereby ExecInitResultRelation() is now
> > performed for all relations before calling ExecInitNode() on the
> > subplan. As mentioned, I moved other per-result-rel initializations
> > into the same loop that does ExecInitResultRelation(), while moving
> > code related to some initializations into initialize-on-first-access
> > accessor functions for the concerned fields. I chose to do that for
> > ri_WIthCheckOptionExprs, ri_projectReturning, and ri_projectNew.
>
> I pushed the parts of this that I thought were safe and productive.

Thank you.

+/*
+ * ExecInitInsertProjection
+ * Do one-time initialization of projection data for INSERT tuples.
+ *
+ * INSERT queries may need a projection to filter out junk attrs in the tlist.
+ *
+ * This is "one-time" for any given result rel, but we might touch
+ * more than one result rel in the course of a partitioned INSERT.

I don't think we need this last bit for INSERT, because the result
rels for leaf partitions will never have to go through
ExecInitInsertProjection(). Leaf partitions are never directly fed
tuples that ExecModifyTable() extracts out of the subplan, because
those tuples will have gone through the root target table's projection
before being passed to tuple routing. So, if INSERTs will ever need a
projection, only the partitioned table being inserted into will need
to have one built for.

Also, I think we should update the commentary around ri_projectNew a
bit to make it clear that noplace beside ExecGet{Insert|Update}Tuple
should be touching it and the associated slots.

+ * This is "one-time" for any given result rel, but we might touch more than
+ * one result rel in the course of a partitioned UPDATE, and each one needs
+ * its own projection due to possible column order variation.

Minor quibble, but should we write it as "...in the course of an
inherited UPDATE"?

Attached patch contains these changes.

> The business about moving the subplan tree initialization to after
> calling FDWs' BeginForeignModify functions seems to me to be a
> nonstarter. Existing FDWs are going to expect their scan initializations
> to have been done first. I'm surprised that postgres_fdw seemed to
> need only a one-line fix; it could have been far worse. The amount of
> trouble that could cause is absolutely not worth it to remove one loop
> over the result relations.

Okay, that sounds fair. After all, we write this about 'mtstate' in
the description of BeginForeignModify(), which I had failed to notice:

"mtstate is the overall state of the ModifyTable plan node being
executed; global data about the plan and execution state is available
via this structure."

> I also could not get excited about postponing initialization of RETURNING
> or WITH CHECK OPTIONS expressions. I grant that that can be helpful
> when those features are used, but I doubt that RETURNING is used that
> heavily, and WITH CHECK OPTIONS is surely next door to nonexistent
> in performance-critical queries. If the feature isn't used, the cost
> of the existing code is about zero. So I couldn't see that it was worth
> the amount of code thrashing and risk of new bugs involved.

Okay.

> The bit you
> noted about EXPLAIN missing a subplan is pretty scary in this connection;
> I'm not at all sure that that's just cosmetic.

Yeah, this and...

> (Having said that, I'm wondering if there are bugs in these cases for
> cross-partition updates that target a previously-not-used partition.
> So we might have things to fix anyway.)

...this would need to be looked at a bit more closely, which I'll try
to do sometime later this week.

> Anyway, looking at the test case you posted at the very top of this
> thread, I was getting this with HEAD on Friday:
>
> nparts TPS
> 0 12152
> 10 8672
> 100 2753
> 1000 314
>
> and after the two patches I just pushed, it looks like:
>
> 0 12105
> 10 9928
> 100 5433
> 1000 938
>
> So while there's certainly work left to do, that's not bad for
> some low-hanging-fruit grabbing.

Yes, certainly.

I reran my usual benchmark and got the following numbers, this time
comparing v13.2 against the latest HEAD:

nparts 10cols 20cols 40cols

v13.2

64 3231 2747 2217
128 1528 1269 1121
256 709 652 491
1024 96 78 67

v14dev HEAD

64 14835 14360 14563
128 9469 9601 9490
256 5523 5383 5268
1024 1482 1415 1366

Clearly, we've made some very good progress here. Thanks.

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachment Content-Type Size
projectNew-comment-fixes.patch application/octet-stream 2.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Seino Yuki 2021-04-07 08:26:21 Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements
Previous Message Masahiko Sawada 2021-04-07 07:50:17 Re: New IndexAM API controlling index vacuum strategies