Re: logical decoding and replication of sequences, take 2

From: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Subject: Re: logical decoding and replication of sequences, take 2
Date: 2023-05-18 14:23:52
Message-ID: CAExHW5twU2CiCrWk-wsX5cQf4=JjCj3F-wwGOfUbQ4APMoe+-g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,
Sorry for jumping late in this thread.

I started experimenting with the functionality. Maybe something that
was already discussed earlier. Given that the thread is being
discussed for so long and has gone several changes, revalidating the
functionality is useful.

I considered following aspects:
Changes to the sequence on subscriber
-----------------------------------------------------
1. Since this is logical decoding, logical replica is writable. So the
logically replicated sequence can be manipulated on the subscriber as
well. This implementation consolidates the changes on subscriber and
publisher rather than replicating the publisher state as is. That's
good. See example command sequence below
a. publisher calls nextval() - this sets the sequence state on
publisher as (1, 32, t) which is replicated to the subscriber.
b. subscriber calls nextval() once - this sets the sequence state on
subscriber as (34, 32, t)
c. subscriber calls nextval() 32 times - on-disk state of sequence
doesn't change on subscriber
d. subscriber calls nextval() 33 times - this sets the sequence state
on subscriber as (99, 0, t)
e. publisher calls nextval() 32 times - this sets the sequence state
on publisher as (33, 0, t)

The on-disk state on publisher at the end of e. is replicated to the
subscriber but subscriber doesn't apply it. The state there is still
(99, 0, t). I think this is closer to how logical replication of
sequence should look like. This is aso good enough as long as we
expect the replication of sequences to be used for failover and
switchover.

But it might not help if we want to consolidate the INSERTs that use
nextvals(). If we were to treat sequences as accumulating the
increments, we might be able to resolve the conflicts by adjusting the
columns values considering the increments made on subscriber. IIUC,
conflict resolution is not part of built-in logical replication. So we
may not want to go this route. But worth considering.

Implementation agnostic decoded change
--------------------------------------------------------
Current method of decoding and replicating the sequences is tied to
the implementation - it replicates the sequence row as is. If the
implementation changes in future, we might need to revise the decoded
presentation of sequence. I think only nextval() matters for sequence.
So as long as we are replicating information enough to calculate the
nextval we should be good. Current implementation does that by
replicating the log_value and is_called. is_called can be consolidated
into log_value itself. The implemented protocol, thus requires two
extra values to be replicated. Those can be ignored right now. But
they might pose a problem in future, if some downstream starts using
them. We will be forced to provide fake but sane values even if a
future upstream implementation does not produce those values. Of
course we can't predict the future implementation enough to decide
what would be an implementation independent format. E.g. if a
pluggable storage were to be used to implement sequences or if we come
around implementing distributed sequences, their shape can't be
predicted right now. So a change in protocol seems to be unavoidable
whatever we do. But starting with bare minimum might save us from
larger troubles. I think, it's better to just replicate the nextval()
and craft the representation on subscriber so that it produces that
nextval().

3. Primary key sequences
-----------------------------------
I am not experimented with this. But I think we will need to add the
sequences associated with the primary keys to the publications
publishing the owner tables. Otherwise, we will have problems with the
failover. And it needs to be done automatically since a. the names of
these sequences are generated automatically b. publications with FOR
ALL TABLES will add tables automatically and start replicating the
changes. Users may not be able to intercept the replication activity
to add the associated sequences are also addedto the publication.

--
Best Wishes,
Ashutosh Bapat

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Matthias van de Meent 2023-05-18 14:33:56 Re: RFI: Extending the TOAST Pointer
Previous Message Aleksander Alekseev 2023-05-18 14:23:08 The documentation for READ COMMITTED may be incomplete or wrong