Re: Sequence's value can be rollback after a crashed recovery.

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Jeremy Schneider <schneider(at)ardentperf(dot)com>, "Bossart, Nathan" <bossartn(at)amazon(dot)com>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Sequence's value can be rollback after a crashed recovery.
Date: 2021-11-23 21:12:17
Message-ID: 54833.1637701937@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> writes:
> I see Tom speculated we may not flush WAL if a transaction only does
> nextval() in that other thread, but I don't think that's true. AFAICS if
> the nextval() call writes stuff to WAL, the RecordTransactionCommit will
> have wrote_xlog=true and valid XID. And so it'll do the usual usual
> XLogFlush() etc.

Yeah. I didn't look at the code during last year's discussion, but now
I have, and I see that if nextval_internal() decides it needs to make
a WAL entry, it is careful to ensure the xact has an XID:

/*
* If something needs to be WAL logged, acquire an xid, so this
* transaction's commit will trigger a WAL flush and wait for syncrep.
* It's sufficient to ensure the toplevel transaction has an xid, no need
* to assign xids subxacts, that'll already trigger an appropriate wait.
* (Have to do that here, so we're outside the critical section)
*/
if (logit && RelationNeedsWAL(seqrel))
GetTopTransactionId();

So that eliminates my worry that RecordTransactionCommit would decide
it doesn't need to do anything. If there was a WAL record, we will
flush it at commit (and not before).

As you say, this is exactly as durable as any other DML operation.
So I don't feel a need to change the code behavior.

The problematic situation seems to be where an application gets
a nextval() result and uses it for some persistent outside-the-DB
state, without having made sure that the nextval() was committed.
You could say that that's the same rookie error as relying on the
persistence of any other uncommitted DML ... except that at [1]
we say

To avoid blocking concurrent transactions that obtain numbers from the
same sequence, a nextval operation is never rolled back; that is, once
a value has been fetched it is considered used and will not be
returned again. This is true even if the surrounding transaction later
aborts, or if the calling query ends up not using the value.

It's not so unreasonable to read that as promising persistence over
crashes as well as xact aborts. So I think we need to improve the docs
here. A minimal fix would be to leave the existing text alone and add a
separate para to the <caution> block, along the lines of

However, the above statements do not apply if the database cluster
crashes before committing the transaction containing the nextval
operation. In that case the sequence advance might not have made its
way to persistent storage, so that it is uncertain whether the same
value can be returned again after the cluster restarts. If you wish
to use a nextval result for persistent outside-the-database purposes,
make sure that the nextval has been committed before doing so.

I wonder though if we shouldn't try to improve the existing text.
The phrasing "never rolled back" seems like it's too easily
misinterpreted. Maybe rewrite the <caution> block like

To avoid blocking concurrent transactions that obtain numbers from the
same sequence, the value obtained by nextval is not reclaimed for
re-use if the calling transaction later aborts. This means that
transaction aborts or database crashes can result in gaps in the
sequence of assigned values. That can happen without a transaction
abort, too.
-- this text is unchanged: --
For example an INSERT with an ON CONFLICT clause will compute the
to-be-inserted tuple, including doing any required nextval calls,
before detecting any conflict that would cause it to follow the ON
CONFLICT rule instead. Such cases will leave unused “holes” in the
sequence of assigned values. Thus, PostgreSQL sequence objects cannot
be used to obtain “gapless” sequences.

Likewise, any sequence state changes made by setval are not undone if
the transaction rolls back.
-- end unchanged text --

If the database cluster crashes before committing the transaction
containing a nextval operation, the sequence advance might not yet
have made its way to persistent storage, so that it is uncertain
whether the same value can be returned again after the cluster
restarts. This is harmless for usage of the nextval value within
that transaction, since its other effects will not be visible either.
However, if you wish to use a nextval result for persistent
outside-the-database purposes, make sure that the nextval operation
has been committed before doing so.

Thoughts?

regards, tom lane

[1] https://www.postgresql.org/docs/current/functions-sequence.html

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jacob Champion 2021-11-23 21:37:34 Re: pg_upgrade parallelism
Previous Message Jeremy Schneider 2021-11-23 21:08:40 Re: Sequence's value can be rollback after a crashed recovery.