Re: Allow batched insert during cross-partition updates

From: gkokolatos(at)pm(dot)me
To: Amit Langote <amitlangote09(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)fujitsu(dot)com>
Subject: Re: Allow batched insert during cross-partition updates
Date: 2021-03-12 10:59:19
Message-ID: los9wb7ge4GSExGQxQmLYOhfIFxRUCOTVM9RbFhXnScVtBWWrheAauC9oVo9mpYfeV7wFOn2a-ccPG1ktOMb42sp_tWzTey912lPPGu3-EU=@pm.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Friday, March 12, 2021 3:45 AM, Amit Langote <amitlangote09(at)gmail(dot)com> wrote:

> On Thu, Mar 11, 2021 at 8:36 PM gkokolatos(at)pm(dot)me wrote:
>
> > On Thursday, March 11, 2021 9:42 AM, Amit Langote amitlangote09(at)gmail(dot)com wrote:
> >
> > > On Wed, Mar 10, 2021 at 9:30 PM Georgios gkokolatos(at)protonmail(dot)com wrote:
> > >
> > > > I continued looking a bit at the patch, yet I am either failing to see fix or I am
> > > > looking at the wrong thing. Please find attached a small repro of what my expectetions
> > > > were.
> > > > As you can see in the repro, I would expect the
> > > > UPDATE local_root_remote_partitions SET a = 2;
> > > > to move the tuples to remote_partition_2 on the same transaction.
> > > > However this is not the case, with or without the patch.
> > > > Is my expectation of this patch wrong?
> > >
> > > I think yes. We currently don't have the feature you are looking for
> > > -- moving tuples from one remote partition to another remote
> > > partition. This patch is not for adding that feature.
> >
> > Thank you for correcting me.
> >
> > > What we do support however is moving rows from a local partition to a
> > > remote partition and that involves performing an INSERT on the latter.
> > > This patch is for teaching those INSERTs to use batched mode if
> > > allowed, which is currently prohibited. So with this patch, if an
> > > UPDATE moves 10 rows from a local partition to a remote partition,
> > > then they will be inserted with a single INSERT command containing all
> > > 10 rows, instead of 10 separate INSERT commands.
> >
> > So, if I understand correctly then in my previously attached repro I
> > should have written instead:
> >
> > CREATE TABLE local_root_remote_partitions (a int) PARTITION BY LIST ( a );
> > CREATE TABLE
> > local_root_local_partition_1
> > PARTITION OF
> > local_root_remote_partitions FOR VALUES IN (1);
> >
> > CREATE FOREIGN TABLE
> > local_root_remote_partition_2
> > PARTITION OF
> > local_root_remote_partitions FOR VALUES IN (2)
> > SERVER
> > remote_server
> > OPTIONS (
> > table_name 'remote_partition_2',
> > batch_size '10'
> > );
> >
> > INSERT INTO local_root_remote_partitions VALUES (1), (1);
> > -- Everything should be on local_root_local_partition_1 and on the same transaction
> > SELECT ctid, xmin, xmax, cmax, tableoid::regclass, a FROM local_root_remote_partitions;
> >
> > UPDATE local_root_remote_partitions SET a = 2;
> > -- Everything should be on remote_partition_2 and on the same transaction
> > SELECT ctid, xmin, xmax, cmax, tableoid::regclass, a FROM local_root_remote_partitions;
> >
> >
> > I am guessing that I am still wrong because the UPDATE operation above will
> > fail due to the restrictions imposed in postgresBeginForeignInsert regarding
> > UPDATES.
>
> Yeah, for the move to work without hitting the restriction you
> mention, you will need to write the UPDATE query such that
> local_root_remote_partition_2 is not updated. For example, as
> follows:
>
> UPDATE local_root_remote_partitions SET a = 2 WHERE a <> 2;
>
> With this query, the remote partition is not one of the result
> relations to be updated, so is able to escape that restriction.

Excellent. Thank you for the explanation and patience.

>
> > Would it be too much to ask for the addition of a test case that will
> > demonstrate the change of behaviour found in patch.
>
> Hmm, I don't think there's a way to display whether the INSERT done on
> the remote partition as a part of an (tuple-moving) UPDATE used
> batching or not. That's because that INSERT's state is hidden from
> EXPLAIN. Maybe we should change EXPLAIN to make it show such hidden
> INSERT's state (especially its batch size) under the original UPDATE
> node, but I am not sure.

Yeah, there does not seem to be a way for explain to do show that information
with the current code.

>
> By the way, the test case added by commit 927f453a94106 does exercise
> the code added by this patch, but as I said in the last paragraph, we
> can't verify that by inspecting EXPLAIN output.

I never doubted that. However, there is a difference. The current patch
changes the query to be executed in the remote from:

INSERT INTO <snip> VALUES ($1);
to:
INSERT INTO <snip> VALUES ($1), ($2) ... ($n);

When this patch gets in, it would be very helpful to know that subsequent
code changes will not cause regressions. So I was wondering if there is
a way to craft a test case that would break for the code in 927f453a94106
yet succeed with the current patch.

I attach version 2 of my small reproduction. I am under the impression that
in this version, examining the value of cmin in the remote table should
give an indication of whether the remote received a multiple insert queries
with a single value, or a single insert query with multiple values.

Or is this a wrong assumption of mine?

Cheers,
//Georgios

>
>
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Amit Langote
> EDB: http://www.enterprisedb.com

Attachment Content-Type Size
v2-repro.sql application/sql 1.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2021-03-12 11:11:03 Re: unrecognized configuration parameter "plpgsql.check_asserts"
Previous Message Walker 2021-03-12 10:53:26 unrecognized configuration parameter "plpgsql.check_asserts"