Re: [BUGS] BUG #6748: sequence value may be conflict in some cases

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [BUGS] BUG #6748: sequence value may be conflict in some cases
Date: 2012-07-25 01:21:06
Message-ID: 9025.1343179266@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

I wrote:
> The tuple in buffers has log_cnt = 1, is_called = false, but the initial
> XLOG_SEQ_LOG record shows log_cnt = 0, is_called = true. So if we crash
> at this point, after recovery it looks like one nextval() has already
> been done. However, AlterSequence generates another XLOG_SEQ_LOG record
> based on what's in shared buffers, so after replay of that, we're back
> to the "original" state where it does not appear that any nextval() has
> been done.

> I'm of the opinion that this kluge needs to be removed; it's just insane
> that we're not logging the same state we leave in our buffers.

Attached is a WIP patch for that. I had to leave nextval() logging
a different state from what it puts into buffers, since without that
we can't have the optimization of one-WAL-record-per-N-nextval-calls,
which seems worth preserving. But I was able to get rid of the other
routines' doing it. I think it's all right for nextval to do it because
the only difference between what it logs and what it leaves in the
buffer is that last_value is advanced and log_cnt is zero; in particular
there is not a difference in is_called state.

For awhile I was convinced that nextval() was broken in another way,
because it does MarkBufferDirty() before changing the buffer contents,
which sure *looks* wrong; what if a checkpoint writes the
not-yet-modified buffer contents and then clears the dirty bit?
However, I now think this is safe because we hold the buffer content
lock exclusively until we've finished making the changes. A checkpoint
might observe the dirty bit set and attempt to write the page, but it
will block trying to get shared buffer content lock until we complete
the changes. So that should be all right, and I've added some comments
to explain it, but if anyone thinks it's wrong speak up!

Another thing that I think is a real bug is that AlterSequence needs
to reset log_cnt to zero any time it changes any of the sequence
generation parameters, to ensure that the new values will be applied
promptly. I've done that in the attached.

I also modified the code so that log_cnt is initialized to zero not one
when a sequence is created or reset to is_called = false. This doesn't
make any functional difference given the change to force logging a WAL
record when is_called becomes set, but I think it's a good change
anyway: it doesn't make any sense anymore to imply that one sequence
value has been pre-reserved. However, if anyone's particularly annoyed
by the first diff hunk in the regression tests, we could undo that.

Lastly, I modified read_seq_tuple (nee read_info) to return a HeapTuple
pointer to the sequence's tuple, and made use of that to make the rest
of the code simpler and less dependent on page-layout assumptions.
At this point this is only a code-beautification change; it was
functionally necessary in an earlier version of the patch, but I didn't
back it out when I got rid of the change that made it necessary.

Oh, there are also some hunks in here to add debug logging, which I was
using for testing. I'm undecided whether to remove those or clean them
up some more and wrap them in something like #ifdef SEQUENCE_DEBUG.

One other point about the regression test diffs: formerly, the typical
state seen at line 191 of sequence.out was log_cnt = 32. Now it is 31,
because the first nextval call increases log_cnt to 32 and then the
second decreases it to 31. The failure mode described in
http://archives.postgresql.org/pgsql-hackers/2008-08/msg01359.php
is no longer possible, because a checkpoint before the first nextval
call won't change its behavior. But instead, if a checkpoint happens
*between* those two nextvals, the second nextval will decide to create
a WAL entry and it will increase log_cnt to 32. So we still need the
variant expected file sequence_1.out, but now it shows log_cnt = 32
instead of the other way around.

Comments anyone?

regards, tom lane

Attachment Content-Type Size
sequence-fixes.patch text/x-patch 28.2 KB

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message ghazel 2012-07-25 01:42:47 BUG #6756: primary_conninfo is ignored if restore_command is set..
Previous Message pilum.70 2012-07-24 12:45:56 BUG #6751: usage flaws in pg_restore

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2012-07-25 02:52:54 Re: canceling autovacuum task woes
Previous Message Merlin Moncure 2012-07-25 00:06:46 Re: [patch] libpq one-row-at-a-time API