SIGQUIT and lost sequence WAL records

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: PostgreSQL mailing lists <pgsql-bugs(at)postgresql(dot)org>
Subject: SIGQUIT and lost sequence WAL records
Date: 2016-02-17 13:55:16
Message-ID: CAB7nPqQ6+0LNkitj507eDBMh-OBdm_1_X-LmLOQfsjiWXe9SvA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi all,

Lately a couple of colleagues have pinged me regarding the fact that a
server signaled with SIGQUIT, or "pg_ctl stop -m immediate", may
ignore sequence WAL records. See for example the script attached
seqtest.py (simplified test case provided from my colleague Nikhil
Deshpande), that will fail while for example the following command is
run in parallel:
for i in {1..1000}; do pg_ctl restart -m immediate -w; sleep 1; done
The failure happening being for example:
AssertionError: DB returned reused sequence value prev=1118 curr=958
Here it means that the last return value of nextval() was 958, even if
the highest value ever fetched was 1118. Note that this test script
uses psycopg2, but that it does *not* commit transactions on purpose.
Also note that those WAL records do not disappear all the time.
Sometimes they are here and the sequence keeps increasing monotically
after server is restarted.

Trying to find out if there were records ignored I added some logs in
the server thanks to the patch attached. In the case of this last
failure there have been a couple of records after the last one found
at recovery:
WARNING: Logged new sequence value at 0/1505570 for nextval = 957
WARNING: Logged new sequence value at 0/15066D8 for nextval = 990
WARNING: Logged new sequence value at 0/1507828 for nextval = 1023
WARNING: Logged new sequence value at 0/1508990 for nextval = 1056
WARNING: Logged new sequence value at 0/1509AE0 for nextval = 1089
WARNING: Logged new sequence value at 0/150AC48 for nextval = 1122
And if I looked at the WAL records those were not of course not
present. That's not really surprising as recovery didn't use them to
restore the sequence up to where it should have.

func.sgml mentions that a nextval() operation is *never* rolled back
even if a transaction is aborted, though in this case this is exactly
what is happening, even if when the server is stopped the existing
transactions running are forcibly aborted.

I am lacking of fuel to look more at that today, but at this point I
wanted to get some input to see if I am missing something obvious. So,
code bug or documentation bug?
--
Michael

Attachment Content-Type Size
sequence-log-info.patch text/x-diff 509 bytes
seqtest.py text/x-python-script 2.8 KB

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2016-02-17 14:54:30 Re: SIGQUIT and lost sequence WAL records
Previous Message Teodor Sigaev 2016-02-17 12:06:26 Re: BUG #13440: unaccent does not remove all diacritics