Re: Data loss on logical replication, 12.12 to 14.5, ALTER SUBSCRIPTION

From: Michail Nikolaev <michail(dot)nikolaev(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Data loss on logical replication, 12.12 to 14.5, ALTER SUBSCRIPTION
Date: 2023-01-03 15:20:11
Message-ID: CANtu0oh0_e76NO7XqWkrCPXT7zWaXGM-t_cQwbygxZzZftFMig@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> Does that by any chance mean you are using a non-community version of
> Postgres which has some other changes?

It is a managed Postgres service in the general cloud. Usually, such
providers apply some custom minor patches.
The only one I know about - about forbidding of canceling queries
while waiting for synchronous replication acknowledgement.

> It is possible but ideally, in that case, the client should request
> such a transaction again.

I am not sure I get you here.

I'll try to explain what I mean:

The patch I'm referring to does not allow canceling a query while it
waiting acknowledge for ACK for COMMIT message in case of synchronous
replication.
If synchronous standby is down - query and connection just stuck until
server restart (or until standby become available to process ACK).
Tuples changed by such a hanging transaction are not visible by other
transactions. It is all done to prevent seeing spurious tuples in case
of network split.

So, it seems like we had such a situation during that story because of
our synchronous standby downtime (before server restart).
My thoughts just about the possibility of fact that such transactions
(waiting for ACK for COMMIT) are handled somehow incorrectly by
logical replication engine.

Michail.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Егор Чиндяскин 2023-01-03 15:40:57 Re: Stack overflow issue
Previous Message Tom Lane 2023-01-03 15:20:06 Re: 128-bit integers can range only up to (2 ^ 63 -1)