Re: Speedup twophase transactions

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>
Subject: Re: Speedup twophase transactions
Date: 2016-04-12 12:47:40
Views: Raw Message | Whole Thread | Download mbox
Lists: pgsql-hackers

On Mon, Apr 11, 2016 at 7:16 PM, Stas Kelvich wrote:
> Michael, it looks like that you are the only one person who can reproduce that bug. I’ve tried on bunch of OS’s and didn’t observe that behaviour, also looking at your backtraces I can’t get who is holding this lock (and all of that happens before first prepare record is replayed).

Where did you try it. FWIW, I can reproduce that on Linux and OSX, and
only manually though:
1) Create a master and a streaming standby.
2) Run the following on master:
CREATE TABLE foo (a int);
3) stop -m immediate the standby
4) COMMIT PREPARED 'tx' on master
5) Restart standby, and the node will wait for a lock

> Can you investigate it more? Particularly find out who holds the lock?

OK, so if I look at this backtrace that's a standby LOCK record, but
we already know that:
frame #9: 0x0000000107600383
postgres`LockAcquireExtended(locktag=0x00007fff58a4b228, lockmode=8,
sessionLock='\x01', dontWait='\0', reportMemoryError='\0') + 2819 at
frame #10: 0x00000001075f9cd6
postgres`StandbyAcquireAccessExclusiveLock(xid=867, dbOid=16384,
relOid=16385) + 358 at standby.c:627
* frame #11: 0x00000001075fa23b
postgres`standby_redo(record=0x00007f90a9841e38) + 251 at
frame #12: 0x0000000107298d37 postgres`StartupXLOG + 9351 at xlog.c:6871

Here is the record pointer:
(lldb) p *record
(XLogReaderState) $4 = {
read_page = 0x000000010729b3c0 (postgres`XLogPageRead at xlog.c:10973)
system_identifier = 6272572355656465658
private_data = 0x00007fff58a4bf40
ReadRecPtr = 50424128
EndRecPtr = 50424176
decoded_record = 0x00007f90a9843178
main_data = 0x00007f90a9804b78 "\x01"
main_data_len = 16
main_data_bufsz = 784
record_origin = 0
blocks = {
[0] = {
in_use = '\0'
rnode = (spcNode = 1663, dbNode = 16384, relNode = 2674)
forknum = MAIN_FORKNUM
blkno = 22
flags = '\x10'
has_image = '\0'
bkp_image = 0x00007f90a984826b "\x01"
hole_offset = 892
hole_length = 2076
bimg_len = 6116
bimg_info = '\x01'
has_data = '\0'
data = 0x00007f90a98595d8 "\a"
data_len = 0
data_bufsz = 154

And in our case this corresponds to the record with LSN 0/03016940
that cannot take an exclusive LOCK:
rmgr: Transaction len (rec/tot): 784/ 813, tx: 867, lsn:
0/03016610, prev 0/030165D8, desc: PREPARE
rmgr: Standby len (rec/tot): 16/ 42, tx: 0, lsn:
0/03016940, prev 0/03016610, desc: LOCK xid 867 db 16384 rel 16385
rmgr: Standby len (rec/tot): 28/ 54, tx: 0, lsn:
0/03016970, prev 0/03016940, desc: RUNNING_XACTS nextXid 868
latestCompletedXid 866 oldestRunningXid 867; 1 xacts: 867

There are two XID locks taken before that:
rmgr: Standby len (rec/tot): 16/ 42, tx: 867, lsn:
0/03016578, prev 0/03014D40, desc: LOCK xid 867 db 16384 rel 16385
rmgr: Standby len (rec/tot): 16/ 42, tx: 0, lsn:
0/030165A8, prev 0/03016578, desc: LOCK xid 867 db 16384 rel 16385

And pg_locks on the standby is reporting that another lock has been
taken but not released:
=# select locktype, pid, mode, granted, fastpath from pg_locks where
relation = 16385;
locktype | pid | mode | granted | fastpath
relation | 68955 | AccessExclusiveLock | f | f
relation | null | AccessExclusiveLock | t | f
(2 rows)
In this case 68955 is the startup process trying to take the lock for
the LOCK record and it is not granted yet:
ioltas 68955 0.0 0.0 2593064 3228 ?? TXs 4:44PM 0:00.05
postgres: startup process recovering 000000010000000000000003

Now there is already a lock that has been taken and granted,
conflicting here... As the relation is only PREPARE'd yet and not
COMMIT PREPARED at this stage of the replay, isn't this lock taken
during the PREPARE phase, then it is not released by your new code
paths, no?

[One LOCK_DEBUG build later...]

It looks to be the case... The PREPARE phase replayed after the
standby is restarted in recovery creates a series of exclusive locks
on the table created and those are not taken on HEAD. Once those are
replayed the LOCK_STANDBY record is conflicting with it. In the case
of the failure, the COMMIT PREPARED record cannot be fetched from
master via the WAL stream so the relation never becomes visible.

Attached as two log files that are the result of a couple of runs,
those are the logs of the standby after being restarted in crash
- 2pc_master_logs.log, for HEAD.
- 2pc_patch_logs.log, with your last patch applied.
Feel free to have a look at them.

Attachment Content-Type Size
2pc_master_logs.log application/octet-stream 13.3 KB
2pc_patch_logs.log application/octet-stream 9.2 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2016-04-12 12:48:09 Re: Some other things about contrib/bloom and generic_xlog.c
Previous Message Stephen Frost 2016-04-12 12:45:39 Re: Update copyright in genericdesc.c