Re: logical decoding and replication of sequences, take 2

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Heikki Linnakangas <heikki(dot)linnakangas(at)iki(dot)fi>
Subject: Re: logical decoding and replication of sequences, take 2
Date: 2023-02-23 12:56:13
Message-ID: b09e8166-c446-241a-8de7-6ebaf27d9901@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2/22/23 18:04, Jonathan S. Katz wrote:
> On 2/22/23 5:02 AM, Tomas Vondra wrote:
>>
>> On 2/22/23 03:28, Jonathan S. Katz wrote:
>
>>> Thanks for continuing to work on this patch! I tested the latest version
>>> and have some feedback/clarifications.
>>>
>>
>> Thanks!
>
> Also I should mention I've been testing with both async/sync logical
> replication. I didn't have any specific comments on either as it seemed
> to just work and behaviors aligned with existing expectations.
>
> Generally it's been a good experience and it seems to be working. :) At
> this point I'm trying to understand the limitations and tripwires so we
> can guide users appropriately.
>

Good to hear.

>> Yes, this is due to how we WAL-log sequences. We don't log individual
>> increments, but every 32nd increment and we log the "future" sequence
>> state so that after a crash/recovery we don't generate duplicates.
>>
>> So you do nextval() and it returns 1. But into WAL we record 32. And
>> there will be no WAL records until nextval reaches 32 and needs to
>> generate another batch.
>>
>> And because logical replication relies on these WAL records, it inherits
>> this batching behavior with a "jump" on recovery/failover. IMHO it's OK,
>> it works for the "logical failover" use case and if you need gapless
>> sequences then regular sequences are not an issue anyway.
>>
>> It's possible to reduce the jump a bit by reducing the batch size (from
>> 32 to 0) so that every increment is logged. But it doesn't eliminate it
>> because of rollbacks.
>
> I generally agree. I think it's mainly something we should capture in
> the user docs that they can be a jump on the subscriber side, so people
> are not surprised.
>
> Interestingly, in systems that tend to have higher rates of failover
> (I'm thinking of a few distributed systems), this may cause int4
> sequences to exhaust numbers slightly (marginally?) more quickly. Likely
> not too big of an issue, but something to keep in mind.
>

IMHO the number of systems that would work fine with int4 sequences but
this change results in the sequences being "exhausted" too quickly is
indistinguishable from 0. I don't think this is an issue.

>>> 2. Using with origin=none with nonconflicting sequences.
>>>
>>> I modified the example in [1] to set up two schemas with non-conflicting
>>> sequences[2], e.g. on instance 1:
>>>
>>> CREATE TABLE public.room (
>>>      id int GENERATED BY DEFAULT AS IDENTITY (INCREMENT 2 START WITH 1)
>>> PRIMARY KEY,
>>>      name text NOT NULL
>>> );
>>>
>>> and instance 2:
>>>
>>> CREATE TABLE public.room (
>>>      id int GENERATED BY DEFAULT AS IDENTITY (INCREMENT 2 START WITH 2)
>>> PRIMARY KEY,
>>>      name text NOT NULL
>>> );
>>>
>>
>> Well, yeah. We don't support active-active logical replication (at least
>> not with the built-in). You can easily get into similar issues without
>> sequences.
>
> The "origin=none" feature lets you replicate tables bidirectionally.
> While it's not full "active-active", this is a starting point and a
> feature for v16. We'll definitely have users replicating data
> bidirectionally with this.
>

Well, then the users need to use some other way to generate IDs, not
local sequences. Either some sort of distributed/global sequence, UUIDs
or something like that.

>> Replicating a sequence overwrites the state of the sequence on the other
>> side, which may result in it generating duplicate values with the other
>> node, etc.
>
> I understand that we don't currently support global sequences, but I am
> concerned there may be a tripwire here in the origin=none case given
> it's fairly common to use serial/GENERATED BY to set primary keys. And
> it's fairly trivial to set them to be nonconflicting, or at least give
> the user the appearance that they are nonconflicting.
>
> From my high level understand of how sequences work, this sounds like it
> would be a lift to support the example in [1]. Or maybe the answer is
> that you can bidirectionally replicate the changes in the tables, but
> not sequences?
>

Yes, I don't think local sequences don't and can't work in such setups.

> In any case, we should update the restrictions in [2] to state: while
> sequences can be replicated, there is additional work required if you
> are bidirectionally replicating tables that use sequences, esp. if used
> in a PK or a constraint. We can provide alternatives to how a user could
> set that up, i.e. not replicates the sequences or do something like in [3].
>

I agree. I see this as mostly a documentation issue.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joe Conway 2023-02-23 13:15:54 Re: Improving inferred query column names
Previous Message Jelte Fennema 2023-02-23 12:17:33 Re: Allow logical replication to copy tables in binary format