Re: Sequence's value can be rollback after a crashed recovery.

From: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
To: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Sequence's value can be rollback after a crashed recovery.
Date: 2021-11-22 07:43:23
Message-ID: CAKU4AWpat6amYsCdD614u0fwbWPNY6h5=iKt-CZmBXoD_t=Xqg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> > The reason is because we never flush the xlog for the nextval_internal
> > for the above case. So if
> > the system crashes, there is nothing to redo from. It can be fixed
> > with the following online change
> > code.
> >
> > @@ -810,6 +810,8 @@ nextval_internal(Oid relid, bool check_permissions)
> > recptr = XLogInsert(RM_SEQ_ID, XLOG_SEQ_LOG);
> >
> > PageSetLSN(page, recptr);
> > +
> > + XLogFlush(recptr);
> > }
> >
> >
> > If a user uses sequence value for some external systems, the
> > rollbacked value may surprise them.
> > [I didn't run into this issue in any real case, I just studied xlog /
> > sequence stuff today and found this case].
>
> I think that is a bad idea.
> It will have an intolerable performance impact on OLTP queries, doubling
> the number of I/O requests for many cases.
>

The performance argument was expected before this writing. If we look at the
nextval_interval more carefully, we can find it would not flush the xlog every
time even the sequence's cachesize is 1. Currently It happens every 32 times
on the nextval_internal at the worst case.

> Perhaps it would make sense to document that you should never rely on
> sequence values from an uncommitted transaction.

I am OK with this if more people think this is the solution.

--
Best Regards
Andy Fan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2021-11-22 08:15:37 Re: pg_get_publication_tables() output duplicate relid
Previous Message Amit Langote 2021-11-22 07:24:45 Re: pg_get_publication_tables() output duplicate relid