Re: Synchronous commit behavior during network outage

From: Ondřej Žižka <ondrej(dot)zizka(at)stratox(dot)cz>
To: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Synchronous commit behavior during network outage
Date: 2021-05-20 15:40:36
Message-ID: 70850d05-ec84-d6bc-10b5-3b23d3f36925@stratox.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 06/05/2021 06:09, Andrey Borodin wrote:
> I could not understand your reasoning about 2 and 4 nodes. Can you please clarify a bit how 4 node setup can help prevent visibility of commited-locall-but-canceled transactions?
Hello Andrey,

The initial request (for us) was to have a geo cluster with 2 locations
where would be possible to have 2 sync replicas even in case of failure
of one location. This means to have 2 nodes in every location (4
together). If one location fails completely (broken network connection),
Patroni will choose the working location (5 node etcd in 3 locations to
ensure this).

In the initial state, there is 1 sync replica in each location and one
async replica in each location using as a source the sync replica in its
location.
Let's have the following initial situation:
1) Nodes pg11 and pg12 are in one location nodes pg21 and pg22 are in
another location.
2) Nodes pg11 and pg21 are in sync replica
3) Node pg12 is an async replica from pg11
4) Node pg22 is an async replica from pg21
5) Master is pg11.

When the commited-locally-but-canceled situation happens and there is a
problem only with node pg21 (not with the network between nodes), the
async replica pg12 will receive the local commit from pg11 just after
the local commit on pg11 even if the cancellation happens. So there will
be a situation when the commit is present on both pg11 and pg12. If the
pg11 fails, the transaction already exists on pg12 and this node will be
selected as a new leader (latest LSN).

There is a period between the time it is committed and the time it will
have been sent to the async replica when we can lose data, but I expect
this in milliseconds (maybe less).

It will not prevent visibility but will ensure, that the data would not
be lost and in that case, data can be visible on the leader even if they
are not present on the sync replica because there is ensured the
continuity of the data persistence in the async replica.

I hope I explained it understandably.

Regards
Ondrej

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2021-05-20 15:44:45 Re: Alias collision in `refresh materialized view concurrently`
Previous Message Tom Lane 2021-05-20 15:17:43 Re: Bug in query rewriter - hasModifyingCTE not getting set