Re: [PATCH] Fix sequence value may rollback after CREATE DATABASE TEMPLATE with WAL_LOG strategy

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Daisuke Higuchi <higuchi(dot)daisuke11(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Fix sequence value may rollback after CREATE DATABASE TEMPLATE with WAL_LOG strategy
Date: 2026-06-24 20:06:55
Message-ID: CALj2ACVsWz6FuycP=UavXnoPrrBC72rZD7x9SgZ9j7Vt03ZWuw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Mon, Jun 15, 2026 at 9:02 PM Daisuke Higuchi
<higuchi(dot)daisuke11(at)gmail(dot)com> wrote:
>
> I found that "CREATE DATABASE ... TEMPLATE" with WAL_LOG strategy can
> cause sequence values to rollback after a streaming replication failover.
>
> When creating a new database with "CREATE DATABASE ... TEMPLATE" from
> a template database that has sequences, calling nextval() on those
> sequences in the new database causes the primary to have a higher
> last_value than the standby temporarily.
> When a failover occurs and the standby is promoted in this state, the
> sequence value appears to have rollbacked on the new primary.
> I found this issue in PostgreSQL version 15, but I've confirmed that it also
> happens on the master branch.

Nice catch!

Thanks for reporting!

> I show the example as follows.
>
> In the attached patch, during "CREATE DATABASE ... TEMPLATE" with
> WAL_LOG strategy, the sequence's log_cnt is reset to 0 so that
> SEQ_LOG_VALS (32) values are fetched in advance on the next nextval()
> call. This ensures that the standby's last_value is greater than or
> equal to the primary's last_value.
> And, this patch opens sequence page using readBufferWithoutRelcache()
> without relying on relcache entries and resets log_cnt. This follows
> the same approach as ScanSourceDatabasePGClass().
> Note that this issue does not occur with FILE_COPY strategy because
> this performs a checkpoint internally and a WAL record to fetch
> SEQ_LOG_VALS in advance is always emitted on the next nextval() call.
> However, I could not apply the same approach because I understand the
> WAL_LOG strategy is designed to avoid checkpoints.

Some comments on the patch:

1/
@@ -218,6 +222,10 @@ CreateDatabaseUsingWalLog(Oid src_dboid, Oid dst_dboid,
/* Copy relation storage from source to the destination. */
CreateAndCopyRelationData(srcrlocator, dstrlocator, relinfo->permanent);

+ /* Reset log_cnt for sequences to ensure WAL on first nextval() */
+ if (relinfo->relkind == RELKIND_SEQUENCE)
+ ResetSequenceLogCnt(dstrlocator, relinfo->permanent);

What happens if we reset the log_cnt for some sequences and the CREATE
DATABASE then fails (say, while copying other remaining relations'
data pages)? We would have written reset WAL records, but the CREATE
DATABASE failed - and those WAL records now not only need to be
replayed for crash recovery but also flow through to standbys and
logical replication clients.

2/ Don't we need to gate the WAL logging in ResetSequenceLogCnt with
RelationNeedsWAL(rel)?

3/ WAL logging in ResetSequenceLogCnt and other sequence.c look mostly
similar - try to dedup it with a similar one there?

4/ Why not move ResetSequenceLogCnt closer to its friends in sequence.c?

5/ Do we need any special handling for temporary sequences?

6/ Instead of a new file, why not add the test case to existing
src/test/recovery/t/034_create_database.pl?

--
Bharath Rupireddy
Amazon Web Services: https://aws.amazon.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Christophe Pettus 2026-06-24 20:07:12 Re: Get rid of "Section.N.N.N" on DOCs
Previous Message Christophe Pettus 2026-06-24 20:02:18 Re: uuidv7 improperly accepts dates before 1970-01-01