Re: Synchronous commit behavior during network outage

From: Ondřej Žižka <ondrej(dot)zizka(at)stratox(dot)cz>
To: SATYANARAYANA NARLAPURAM <satyanarlapuram(at)gmail(dot)com>
Cc: Aleksander Alekseev <aleksander(at)timescale(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Synchronous commit behavior during network outage
Date: 2021-04-21 19:03:16
Message-ID: 07739e7f-9f9d-cf47-8ba8-155ca90dbe18@stratox.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

> You can monitor the pg_stat_activity for the SYNC_REP_WAIT_FLUSH wait
types to detect this.

I tried to see this this wait_event_type Client or IPC and wait_event
Client_Read or SyncRep. In which situation I can see the
SYNC_REP_WAIT_FLUSH value?

> You should consider these as in doubt transactions and the client
should retry. Again, this can happen in a normal server crash case too.
For example, a transaction committed on the server and before sending
the acknowledgement crashed.  *The client should know how to handle
these cases.*

I have just a light knowledge of the in-doubt transaction. Need to study
more about it, but in real world the client is mostly 'stupid' and does
expect only COMMIT or ROLLBACK. Nothing between.

> There is a third problem that I didn't talk about in this thread
where the async clients (including logical decoding and replication
clients) can get ahead of the new primary and there is no easier way to
undo those changes. For this problem, we need to implement some protocol
in the WAL sender where it sends the log to the consumer only up to the
flush LSN of the standby/quorum replicas. This is something I am working
on right now.

We setup and architecture where are 4 nodes and Patroni as a cluster
manager. Two nodes are sync an each sync node has 1 async. In case
something like this happen (e.g. network to sync replica fails and user
press the CTRL+C), the async replica receives the transaction and apply
it. If the outage is longer than some time (30s by default), management
software checks the LSN and create a new sync replica from the ASYNC
replica.

Ondrej

You should consider these as in doubt transactions and the client should
retry. Again, this can happen in a normal server crash case too. For
example, a transaction committed on the server and before sending the
acknowledgement crashed.  The client should know how to handle these cases
On 21/04/2021 09:20, SATYANARAYANA NARLAPURAM wrote:
>
> This can be an option for us in our case. But there also needs to
> be a process how to detect these "stuck commits" and how to
> invalidate/remove them, because in reality, if the app/user would
> not see the change in the database, it/he/she will try to
> insert/delete it again. If it just stuck without management, it
> will create a queue which can cause, that in the queue there will
> be 2 similar inserts/deletes which can again cause issues (like
> with the primary key I mentioned before).
>
>
>  This shouldn't be a problem as the previous transaction is still
> holding the locks and the new transaction is blocked behind this.
> Outside of the sync replication, this can happen today too with
> glitches/timeouts/ retries between the client and the server. Am I
> missing something?
>
>
> So the process should be in this case:
>
> - DBA receives information, that write operations stuck (DBA in
> coordination with the infrastructure team disconnects all clients
> and prevent new ones to create a new connection).
>
> You can monitor the pg_stat_activity for the SYNC_REP_WAIT_FLUSH wait
> types to detect this.
>
> - DBA will recognize, that there is an issue in communication
> between the primary and the sync replica (caused the issue with
> the propagation of commits)
> - DBA will see that there are some commits that are in the "stuck
> state"
> - DBA removes these stuck commits. Note: Because the client never
> received a confirmation about the successful commit -> changes in
> the DB client tried to perform can't be considered as successful.
>
>
> You should consider these as in doubt transactions and the client
> should retry. Again, this can happen in a normal server crash case
> too. For example, a transaction committed on the server and before
> sending the acknowledgement crashed.  The client should know how to
> handle these cases.
>
> - DBA and infrastructure team restore the communication between
> server nodes to be able to propagate commits from the primary node
> to sync replica.
> - DBA and infrastructure team allows new connections to the database
>
> This approach would require external monitoring and alerting, but
> I would say, that this is an acceptable solution. Would your patch
> be able to perform that?
>
> My patch handles ignoring the cancel events. I ended up keeping the
> other logic (blocking super user connections in the
> client_authentication_hook.
>
> There is a third problem that I didn't talk about in this thread where
> the async clients (including logical decoding and replication clients)
> can get ahead of the new primary and there is no easier way to undo
> those changes. For this problem, we need to implement some protocol in
> the WAL sender where it sends the log to the consumer only up to the
> flush LSN of the standby/quorum replicas. This is something I am
> working on right now.
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-04-21 19:10:09 Re: posgres 12 bug (partitioned table)
Previous Message Tom Lane 2021-04-21 18:43:30 Re: when the startup process doesn't