Re: POC: postgres_fdw insert batching

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)fujitsu(dot)com>
Subject: Re: POC: postgres_fdw insert batching
Date: 2021-02-17 08:51:12
Message-ID: CA+HiwqGyr_WKQ+-eR+hEHUwm4Frm==spQrdZtZU1S8WKT3YBaw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 17, 2021 at 5:46 PM Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
> On Wed, Feb 17, 2021 at 12:04 AM Tomas Vondra
> <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
> > On 2/16/21 10:25 AM, Amit Langote wrote:
> > > On Tue, Feb 16, 2021 at 1:36 AM Tomas Vondra
> > >> Thanks for the patch, it seems fine to me.
> > >
> > > Thanks for checking.
> > >
> > >> I wonder it the commit
> > >> message needs some tweaks, though. At the moment it says:
> > >>
> > >> Prevent FDW insert batching during cross-partition updates
> > >>
> > >> but what the patch seems to be doing is simply initializing the info
> > >> only for CMD_INSERT operations. Which does the trick, but it affects
> > >> everything, i.e. all updates, no? Not just cross-partition updates.
> > >
> > > You're right. Please check the message in the updated patch.
> >
> > Thanks. I'm not sure I understand what "FDW may not be able to handle
> > both the original update operation and the batched insert operation
> > being performed at the same time" means. I mean, if we translate the
> > UPDATE into DELETE+INSERT, then we don't run both the update and insert
> > at the same time, right? What exactly is the problem with allowing
> > batching for inserts in cross-partition updates?
>
> Sorry, I hadn't shared enough details of my investigations when I
> originally ran into this. Such as that I had considered implementing
> the use of batching for these inserts too but had given up.
>
> Now that you mention it, I think I gave a less convincing reason for
> why we should avoid doing it at all. Maybe it would have been more
> right to say that it is the core code, not necessarily the FDWs, that
> currently fails to deal with the use of batching by the insert
> component of a cross-partition update. Those failures could be
> addressed as I'll describe below.
>
> For postgres_fdw, postgresGetForeignModifyBatchSize() could be taught
> to simply use the PgFdwModifyTable that is installed to handle the
> insert component of a cross-partition update (one can get that one via
> aux_fmstate field of the original PgFdwModifyState). However, even
> though that's fine for postgres_fdw to do, what worries (had worried)
> me is that it also results in scribbling on ri_BatchSize that the core
> code may see to determine what to do with a particular tuple, and I
> just have to hope that nodeModifyTable.c doesn't end up doing anything
> unwarranted with the original update based on seeing a non-zero
> ri_BatchSize. AFAICS, we are fine on that front.
>
> That said, there are some deficiencies in the code that have to be
> addressed before we can let postgres_fdw do as mentioned above. For
> example, the code in ExecModifyTable() that runs after breaking out of
> the loop to insert any remaining batched tuples appears to miss the
> tuples batched by such inserts. Apparently, that is because the
> ResultRelInfos used by those inserts are not present in
> es_tuple_routing_result_relations. Turns out I had forgotten that
> execPartition.c doesn't add the ResultRelInfos to that list if they
> are made by ExecInitModifyTable() for the original update operation
> and simply reused by ExecFindPartition() when tuples were routed to
> those partitions. It can be "fixed" by reverting to the original
> design in Tsunakawa-san's patch where the tuple routing result
> relations were obtained from the PartitionTupleRouting data structure,
> which fortunately stores all tuple routing result relations. (Sorry,
> I gave wrong advice in [1] in retrospect.)
>
> > On a closer look, it seems the problem actually lies in a small
> > inconsistency between create_foreign_modify and ExecInitRoutingInfo. The
> > former only set batch_size for CMD_INSERT while the latter called the
> > BatchSize() for all operations, expecting >= 1 result. So we may either
> > relax create_foreign_modify and set batch_size for all DML, or make
> > ExecInitRoutingInfo stricter (which is what the patches here do).
>
> I think we should be fine if we make
> postgresGetForeignModifyBatchSize() use the correct PgFdwModifyState
> as described above. We can be sure that we are not mixing the
> information used by the batched insert with that of the original
> unbatched update.
>
> > Is there a reason not to do the first thing, allowing batching of
> > inserts during cross-partition updates? I tried to do that, but it
> > dawned on me that we can't mix batched and un-batched operations, e.g.
> > DELETE + INSERT, because that'd break the order of execution, leading to
> > bogus results in case the same row is modified repeatedly, etc.
>
> Actually, postgres_fdw only supports moving a row into a partition (as
> part of a cross-partition update that is) if it has already finished
> performing any updates on it. So there is no worry of rows that are
> moved into a partition subsequently getting updated due to the
> original command.
>
> The attached patch implements the changes necessary to make these
> inserts use batching too.
>
> [1] https://www.postgresql.org/message-id/CA%2BHiwqEbnhwVJMsukTP-S9Kv1ynC7Da3yuqSPZC0Y7oWWOwoHQ%40mail.gmail.com

Oops, I had mistakenly not hit "Reply All". Attaching the patch again.

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachment Content-Type Size
v5-0001-Allow-batching-of-inserts-to-occur-in-some-cases.patch application/octet-stream 13.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Takashi Menjo 2021-02-17 09:02:42 Re: [PoC] Non-volatile WAL buffer
Previous Message Amit Langote 2021-02-17 08:48:08 Re: A reloption for partitioned tables - parallel_workers