Re: In logical replication concurrent update of partition key creates a duplicate record on standby.

From: amul sul <sulamul(at)gmail(dot)com>
To: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: In logical replication concurrent update of partition key creates a duplicate record on standby.
Date: 2018-02-07 10:12:27
Message-ID: CAAJ_b94W+k9diXDq+X3POaEFqN1BdNuW9bU5mwGnmz8gJGx+9g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 7, 2018 at 3:03 PM, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> wrote:
> On 7 February 2018 at 13:53, amul sul <sulamul(at)gmail(dot)com> wrote:
>> Hi,
>>
>> If an update of partition key involves tuple movement from one partition to
>> another partition then there will be a separate delete on one partition and
>> insert on the other partition made.
>>
>> In the logical replication if an update performed on the master and standby at
>> the same moment, then replication worker tries to replicate delete + insert
>> operation on standby. While replying master changes on standby for the delete
>> operation worker will log "concurrent update, retrying" message (because the
>> update on standby has already deleted) and move forward to reply the next
>> insert operation. Standby update also did the same delete+insert is as part of
>> the update of partition key in a result there will be two records inserted on
>> standby.
>
> A quick thinking on how to resolve this makes me wonder if we can
> manage to pass some information through logical decoding that the
> delete is part of a partition key update. This is analogous to how we
> set some information locally in the tuple by setting
> tp.t_data->t_ctid.ip_blkid to InvalidBlockNumber.
>

+1, also if worker failed to reply delete operation on standby then
we need to decide what will be the next step, should we skip follow
insert operation or error out or something else.

> I guess, at the node 2 where this issue reproduces, this issue can
> also be reproduced if there is a non-partition-key UPDATE going on,
> and the tuple gets deleted as part of the replaying of partition-key
> update ? This UPDATE will skip the update, thinking that the tuple is
> deleted. This is similar to what's happening now in case of local
> concurrent updates, for which the fix is being worked upon.

Yes, you are correct, at node 2 the reported issue is also reproducible without
the update of partition key.

== NODE 2 ==

postgres=# update foo set b='node2_update' where a=1;
UPDATE 1
postgres=# select * from foo;
a | b
---+--------------
1 | node2_update
(1 row)

< -- continued replication worker -->

postgres=# 2018-02-07 15:26:53.323 IST [86449] LOG: concurrent update, retrying

postgres=# select tableoid::regclass, * from foo;
tableoid | a | b
----------+---+--------------
foo1 | 1 | node2_update
foo2 | 2 | node1_update
(2 rows)

Regards,
Amul Sul

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2018-02-07 10:36:37 Re: non-bulk inserts and tuple routing
Previous Message Magnus Hagander 2018-02-07 09:59:40 Re: git instructions