Re: POC: postgres_fdw insert batching

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Amit Langote <amitlangote09(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)fujitsu(dot)com>
Subject: Re: POC: postgres_fdw insert batching
Date: 2021-02-17 19:36:36
Message-ID: 6929d485-2d2a-da46-3681-4a400a3d794f@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2/17/21 9:51 AM, Amit Langote wrote:
> On Wed, Feb 17, 2021 at 5:46 PM Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
>> On Wed, Feb 17, 2021 at 12:04 AM Tomas Vondra
>> <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>>> On 2/16/21 10:25 AM, Amit Langote wrote:
>>>> On Tue, Feb 16, 2021 at 1:36 AM Tomas Vondra
>>>>> Thanks for the patch, it seems fine to me.
>>>>
>>>> Thanks for checking.
>>>>
>>>>> I wonder it the commit
>>>>> message needs some tweaks, though. At the moment it says:
>>>>>
>>>>> Prevent FDW insert batching during cross-partition updates
>>>>>
>>>>> but what the patch seems to be doing is simply initializing the info
>>>>> only for CMD_INSERT operations. Which does the trick, but it affects
>>>>> everything, i.e. all updates, no? Not just cross-partition updates.
>>>>
>>>> You're right. Please check the message in the updated patch.
>>>
>>> Thanks. I'm not sure I understand what "FDW may not be able to handle
>>> both the original update operation and the batched insert operation
>>> being performed at the same time" means. I mean, if we translate the
>>> UPDATE into DELETE+INSERT, then we don't run both the update and insert
>>> at the same time, right? What exactly is the problem with allowing
>>> batching for inserts in cross-partition updates?
>>
>> Sorry, I hadn't shared enough details of my investigations when I
>> originally ran into this. Such as that I had considered implementing
>> the use of batching for these inserts too but had given up.
>>
>> Now that you mention it, I think I gave a less convincing reason for
>> why we should avoid doing it at all. Maybe it would have been more
>> right to say that it is the core code, not necessarily the FDWs, that
>> currently fails to deal with the use of batching by the insert
>> component of a cross-partition update. Those failures could be
>> addressed as I'll describe below.
>>
>> For postgres_fdw, postgresGetForeignModifyBatchSize() could be taught
>> to simply use the PgFdwModifyTable that is installed to handle the
>> insert component of a cross-partition update (one can get that one via
>> aux_fmstate field of the original PgFdwModifyState). However, even
>> though that's fine for postgres_fdw to do, what worries (had worried)
>> me is that it also results in scribbling on ri_BatchSize that the core
>> code may see to determine what to do with a particular tuple, and I
>> just have to hope that nodeModifyTable.c doesn't end up doing anything
>> unwarranted with the original update based on seeing a non-zero
>> ri_BatchSize. AFAICS, we are fine on that front.
>>
>> That said, there are some deficiencies in the code that have to be
>> addressed before we can let postgres_fdw do as mentioned above. For
>> example, the code in ExecModifyTable() that runs after breaking out of
>> the loop to insert any remaining batched tuples appears to miss the
>> tuples batched by such inserts. Apparently, that is because the
>> ResultRelInfos used by those inserts are not present in
>> es_tuple_routing_result_relations. Turns out I had forgotten that
>> execPartition.c doesn't add the ResultRelInfos to that list if they
>> are made by ExecInitModifyTable() for the original update operation
>> and simply reused by ExecFindPartition() when tuples were routed to
>> those partitions. It can be "fixed" by reverting to the original
>> design in Tsunakawa-san's patch where the tuple routing result
>> relations were obtained from the PartitionTupleRouting data structure,
>> which fortunately stores all tuple routing result relations. (Sorry,
>> I gave wrong advice in [1] in retrospect.)
>>
>>> On a closer look, it seems the problem actually lies in a small
>>> inconsistency between create_foreign_modify and ExecInitRoutingInfo. The
>>> former only set batch_size for CMD_INSERT while the latter called the
>>> BatchSize() for all operations, expecting >= 1 result. So we may either
>>> relax create_foreign_modify and set batch_size for all DML, or make
>>> ExecInitRoutingInfo stricter (which is what the patches here do).
>>
>> I think we should be fine if we make
>> postgresGetForeignModifyBatchSize() use the correct PgFdwModifyState
>> as described above. We can be sure that we are not mixing the
>> information used by the batched insert with that of the original
>> unbatched update.
>>
>>> Is there a reason not to do the first thing, allowing batching of
>>> inserts during cross-partition updates? I tried to do that, but it
>>> dawned on me that we can't mix batched and un-batched operations, e.g.
>>> DELETE + INSERT, because that'd break the order of execution, leading to
>>> bogus results in case the same row is modified repeatedly, etc.
>>
>> Actually, postgres_fdw only supports moving a row into a partition (as
>> part of a cross-partition update that is) if it has already finished
>> performing any updates on it. So there is no worry of rows that are
>> moved into a partition subsequently getting updated due to the
>> original command.
>>
>> The attached patch implements the changes necessary to make these
>> inserts use batching too.
>>
>> [1] https://www.postgresql.org/message-id/CA%2BHiwqEbnhwVJMsukTP-S9Kv1ynC7Da3yuqSPZC0Y7oWWOwoHQ%40mail.gmail.com
>
> Oops, I had mistakenly not hit "Reply All". Attaching the patch again.
>

Thanks. The patch seems reasonable, but it's a bit too large for a fix,
so I'll go ahead and push one of the previous fixes restricting batching
to plain INSERT commands. But this seems useful, so I suggest adding it
to the next commitfest.

One thing that surprised me is that we only move the rows *to* the
foreign partition, not from it (even on pg13, i.e. before the batching
etc.). I mean, using the example you posted earlier, with one foreign
and one local partition, consider this:

delete from p;
insert into p values (2);

test=# select * from p2;
a
---
2
(1 row)

test=# update p set a = 1;
UPDATE 1

test=# select * from p1;
a
---
1
(1 row)

OK, so it was moved to the foreign partition, which is for rows with
value in (1). So far so good. Let's do another update:

test=# update p set a = 2;
UPDATE 1
test=# select * from p1;
a
---
2
(1 row)

So now the foreign partition contains value (2), which is however wrong
with respect to the partitioning rules - this should be in p2, the local
partition. This however causes pretty annoying issue:

test=# explain analyze select * from p where a = 2;

QUERY PLAN
---------------------------------------------------------------
Seq Scan on p2 p (cost=0.00..41.88 rows=13 width=4)
(actual time=0.024..0.028 rows=0 loops=1)
Filter: (a = 2)
Planning Time: 0.355 ms
Execution Time: 0.089 ms
(4 rows)

That is, we fail to find the row, because we eliminate the partition.

Now, maybe this is expected, but it seems like a rather mind-boggling
violation of POLA principle. I've checked if postgres_fdw mentions this
somewhere, but all I see about row movement is this:

Note also that postgres_fdw supports row movement invoked by UPDATE
statements executed on partitioned tables, but it currently does not
handle the case where a remote partition chosen to insert a moved row
into is also an UPDATE target partition that will be updated later.

and if I understand that correctly, that's about something different.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joe Conway 2021-02-17 20:14:07 Re: documentation fix for SET ROLE
Previous Message David G. Johnston 2021-02-17 19:12:47 Re: documentation fix for SET ROLE