Logical Replication vs. 2PC

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Markus Wanner <markus(dot)wanner(at)enterprisedb(dot)com>, osumi(dot)takamichi(at)fujitsu(dot)com
Subject: Logical Replication vs. 2PC
Date: 2021-03-18 09:45:52
Message-ID: CAA4eK1+opiV4aFTmWWUF9h_32=HfPOW9vZASHarT0UA5oBrtGw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

While reviewing/testing subscriber-side work for $SUBJECT [1], I
noticed a problem that seems to need a broader discussion, so started
this thread. We can get prepare for the same GID more than once for
the cases where we have defined multiple subscriptions for
publications on the same server and prepared transaction has
operations on tables subscribed to those subscriptions. For such
cases, one of the prepare will be successful and others will fail in
which case the server will send them again. Once the commit prepared
is done for the first one, the next prepare will be successful. Now,
this is not ideal but will work.

However, if the user has setup synchronous_standby_names for all the
subscriptions then we won't be able to proceed because the prepare on
publisher will wait for all the subscriptions to ack and the
subscriptions are waiting for the first prepare to finish. See an
example below for such a situation. I think this can also happen if we
get any key violation while applying the changes on the subscriber,
but for that, we can ask the user to remove the violating key on the
subscriber as that is what we suggest now also for commits. Similarly,
say the user has already prepared the transaction with the same GID on
subscriber-node, then also we can get into a similar situation but for
that, we can ask the user to commit such a GID.

We can think of appending some unique identifier (like subid) with GID
but that won't work for cascaded standby setup (where the prepares on
subscriber will be again sent to another subscriber) as the GID can
become too long. So that might not be a good solution, maybe we can
optimize it in some way that we append only when there is a GID clash.
The other thing we could do is to ask the user to temporarily disable
the subscription and change synchronous_standby_settings on the
publisher node. Any better ideas?

Example of the above scenario, you can see this problem after applying
the patches at [1].

Publisher
=================
CREATE TABLE mytbl(id SERIAL PRIMARY KEY, somedata int, text varchar(120));

BEGIN;
INSERT INTO mytbl(somedata, text) VALUES (1, 1);
INSERT INTO mytbl(somedata, text) VALUES (1, 2);
COMMIT;

CREATE PUBLICATION mypub FOR TABLE mytbl;

CREATE TABLE mytbl1(id SERIAL PRIMARY KEY, somedata int, text varchar(120));

BEGIN;
INSERT INTO mytbl1(somedata, text) VALUES (1, 1);
INSERT INTO mytbl1(somedata, text) VALUES (1, 2);
COMMIT;

CREATE PUBLICATION mypub1 FOR TABLE mytbl1;

Subscriber
=============
CREATE TABLE mytbl(id SERIAL PRIMARY KEY, somedata int, text varchar(120));

CREATE SUBSCRIPTION mysub
CONNECTION 'host=localhost port=5432 dbname=postgres'
PUBLICATION mypub WITH(two_phase = on);

CREATE TABLE mytbl1(id SERIAL PRIMARY KEY, somedata int, text
varchar(120)); CREATE SUBSCRIPTION mysub1
CONNECTION 'host=localhost port=5432 dbname=postgres'
PUBLICATION mypub1 WITH(two_phase = on);

Now, set synchronous_standby_names = 'FIRST 2 (mysub, mysub1)' on the
publisher in postgresql.conf and restart both publisher and
subscriber, actually restart is not required as
synchronous_standby_names is a SIGHUP parameter.

Publisher
=============
BEGIN;
Insert into mytbl values(17,1,18);
Insert into mytbl1 values(17,1,18);
Prepare Transaction 'foo';

Now, this Prepare transaction will wait forever because on subscriber
we are getting "ERROR: transaction identifier "foo" is already in
use" which means it is waiting for a publisher to send commit prepared
for first apply worker and publisher is waiting for both the
subscriptions to send ack. This is happening because the prepared
transaction on publisher operates on tables of both subscriptions.

In short, on the subscriber, both the apply workers (corresponding to
two subscriptions) are getting the same prepare transaction GID,
leading to an error on the subscriber and making the publisher wait
forever.

Thoughts?

[1] - https://www.postgresql.org/message-id/CAHut%2BPv3X7YH_nDEjH1ZJf5U6M6DHHtEjevu7PY5Dv5071jQ4A%40mail.gmail.com

--
With Regards,
Amit Kapila.

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Nancarrow 2021-03-18 09:48:26 Re: Avoid CommandCounterIncrement in RI trigger when INSERT INTO referencing table
Previous Message Amit Langote 2021-03-18 09:42:35 Re: crash during cascaded foreign key update