Re: logical decoding and replication of sequences

From: Hannu Krosing <hannuk(at)google(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>
Subject: Re: logical decoding and replication of sequences
Date: 2021-09-25 20:05:43
Message-ID: CAMT0RQQeDR51xs8zTa25YpfKB1B34nS-Q4hhsRPznVsjMB_P1w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Just a note for some design decisions

> 1) By default, sequences are treated non-transactionally, i.e. sent to the output plugin right away.

If our aim is just to make sure that all user-visible data in
*transactional* tables is consistent with sequence state then one
very much simplified approach to this could be to track the results of
nextval() calls in a transaction at COMMIT put the latest sequence
value in WAL (or just track the sequences affected and put the latest
sequence state in WAL at commit which needs extra read of sequence but
protects against race conditions with parallel transactions which get
rolled back later)

This avoids sending redundant changes for multiple nextval() calls
(like loading a million-row table with sequence-generated id column)

And one can argue that we can safely ignore anything in ROLLBACKED
sequences. This is assuming that even if we did advance the sequence
paste the last value sent by the latest COMMITTED transaction it does
not matter for database consistency.

It can matter if customers just call nextval() in rolled-back
transactions and somehow expect these values to be replicated based on
reasoning along "sequences are not transactional - so rollbacks should
not matter" .

Or we may get away with most in-detail sequence tracking on the source
if we just keep track of the xmin of the sequence and send the
sequence info over at commit if it == current_transaction_id ?

-----
Hannu Krosing
Google Cloud - We have a long list of planned contributions and we are hiring.
Contact me if interested.

On Fri, Sep 24, 2021 at 9:16 PM Tomas Vondra
<tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>
> Hi,
>
> On 9/23/21 12:27 PM, Peter Eisentraut wrote:
> > On 30.07.21 20:26, Tomas Vondra wrote:
> >> Here's a an updated version of this patch - rebased to current master,
> >> and fixing some of the issues raised in Peter's review.
> >
> > This patch needs an update, as various conflicts have arisen now.
> >
> > As was discussed before, it might be better to present a separate patch
> > for just the logical decoding part for now, since the replication and
> > DDL stuff has the potential to conflict heavily with other patches being
> > discussed right now. It looks like cutting this patch in two should be
> > doable easily.
> >
>
> Attached is the rebased patch, split into three parts:
>
> 1) basic decoding infrastructure (decoding, reorderbuffer etc.)
> 2) support for sequences in test_decoding
> 3) support for sequences in built-in replication (catalogs, syntax, ...)
>
> The last part is the largest one - I'm sure we'll have discussions about
> the grammar, adding sequences automatically, etc. But as you said, let's
> focus on the first part, which deals with the required decoding stuff.
>
> I've added a couple comments, explaining how we track sequences, why we
> need the XID in nextval() etc. I've also added streaming support.
>
>
> > I looked through the test cases in test_decoding again. It all looks
> > pretty sensible. If anyone can think of any other tricky or dubious
> > cases, we can add them there. It's easiest to discuss these things with
> > concrete test cases rather than in theory.
> >
> > One slightly curious issue is that this can make sequence values go
> > backwards, when seen by the logical decoding consumer, like in the test
> > case:
> >
> > + BEGIN
> > + sequence: public.test_sequence transactional: 1 created: 1 last_value:
> > 1, log_cnt: 0 is_called: 0
> > + COMMIT
> > + sequence: public.test_sequence transactional: 0 created: 0 last_value:
> > 33, log_cnt: 0 is_called: 1
> > + BEGIN
> > + sequence: public.test_sequence transactional: 1 created: 1 last_value:
> > 4, log_cnt: 0 is_called: 1
> > + COMMIT
> > + sequence: public.test_sequence transactional: 0 created: 0 last_value:
> > 334, log_cnt: 0 is_called: 1
> >
> > I suppose that's okay, since it's not really the intention that someone
> > is concurrently consuming sequence values on the subscriber. Maybe
> > something for the future. Fixing that would require changing the way
> > transactional sequence DDL updates these values, so it's not directly
> > the job of the decoding to address this.
> >
>
> Yeah, that's due to how ALTER SEQUENCE does things, and I agree redoing
> that seems well out of scope for this patch. What seems a bit suspicious
> is that some of the ALTER SEQUENCE changes have "created: 1" - it's
> probably correct, though, because those ALTER SEQUENCE statements can be
> rolled-back, so we see it as if a new sequence is created. The flag name
> might be a bit confusing, though.
>
> > A small thing I found: Maybe the text that test_decoding produces for
> > sequences can be made to look more consistent with the one for tables.
> > For example, in
> >
> > + BEGIN
> > + sequence: public.test_table_a_seq transactional: 1 created: 1
> > last_value: 1, log_cnt: 0 is_called: 0
> > + sequence: public.test_table_a_seq transactional: 1 created: 0
> > last_value: 33, log_cnt: 0 is_called: 1
> > + table public.test_table: INSERT: a[integer]:1 b[integer]:100
> > + table public.test_table: INSERT: a[integer]:2 b[integer]:200
> > + COMMIT
> >
> > note how the punctuation is different.
>
> I did tweak this a bit, hopefully it's more consistent.
>
>
> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2021-09-25 20:08:57 Re: rand48 replacement
Previous Message Justin Pryzby 2021-09-25 19:53:22 Re: extended stats on partitioned tables