Re: Speedup twophase transactions

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>
Cc: David Steele <david(at)pgmasters(dot)net>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Speedup twophase transactions
Date: 2016-04-07 06:29:13
Message-ID: CAB7nPqQgwCcCiCTvGsHJyDDiFs-SXVkNc9=Yy94DVfXLm8d7Tw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 6, 2016 at 6:47 PM, Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru> wrote:
>> On Apr 2, 2016, at 3:14 AM, Michael Paquier <michael(dot)paquier(at)gmail(dot)com> wrote:
>>
>> On Fri, Apr 1, 2016 at 10:53 PM, Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru> wrote:
>>> I wrote:
>>>> While testing the patch, I found a bug in the recovery conflict code
>>>> path. You can do the following to reproduce it:
>>>> 1) Start a master with a standby
>>>> 2) prepare a transaction on master
>>>> 3) Stop immediate on standby to force replay
>>>> 4) commit prepare transaction on master
>>>> 5) When starting the standby, it remains stuck here:
>>>
>>> Hm, I wasn’t able to reproduce that. Do you mean following scenario or am I missing something?
>>>
>>> (async replication)
>>>
>>> $node_master->psql('postgres', "
>>> begin;
>>> insert into t values (1);
>>> prepare transaction 'x';
>>> ");
>>> $node_slave->teardown_node;
>>> $node_master->psql('postgres',"commit prepared 'x'");
>>> $node_slave->start;
>>> $node_slave->psql('postgres',"select count(*) from pg_prepared_xacts", stdout => \$psql_out);
>>> is($psql_out, '0', "Commit prepared on master while slave is down.");
>>
>> Actually, not exactly, the transaction prepared on master created a
>> table. Sorry for the lack of precisions in my review.
>
> Sorry for delay.
>
> Actually I can’t reproduce that again, tried with following tx:

Well not for me, here are more details, with a test case attached:
* thread #1: tid = 0x50c5b, 0x00007fff93822f96
libsystem_kernel.dylib`poll + 10, queue = 'com.apple.main-thread',
stop reason = signal SIGSTOP
* frame #0: 0x00007fff93822f96 libsystem_kernel.dylib`poll + 10
frame #1: 0x00000001023cdda3
postgres`WaitEventSetWaitBlock(set=0x00007fde50858f28, cur_timeout=-1,
occurred_events=0x00007fff5dc87cf8, nevents=1) + 51 at latch.c:1102
frame #2: 0x00000001023cd786
postgres`WaitEventSetWait(set=0x00007fde50858f28, timeout=-1,
occurred_events=0x00007fff5dc87cf8, nevents=1) + 390 at latch.c:935
frame #3: 0x00000001023cd227
postgres`WaitLatchOrSocket(latch=0x000000010b9b12e4, wakeEvents=1,
sock=-1, timeout=-1) + 343 at latch.c:347
frame #4: 0x00000001023cd0ca
postgres`WaitLatch(latch=0x000000010b9b12e4, wakeEvents=1, timeout=0)
+ 42 at latch.c:302
frame #5: 0x00000001023eb306 postgres`ProcWaitForSignal + 38 at proc.c:1731
frame #6: 0x00000001023da24b
postgres`ResolveRecoveryConflictWithLock(locktag=LOCKTAG at
0x00007fff5dc87db8) + 187 at standby.c:391
frame #7: 0x00000001023ea408
postgres`ProcSleep(locallock=0x00007fde5083dac8,
lockMethodTable=0x000000010286e278) + 1128 at proc.c:1215
frame #8: 0x00000001023e25e6
postgres`WaitOnLock(locallock=0x00007fde5083dac8,
owner=0x0000000000000000) + 358 at lock.c:1703
frame #9: 0x00000001023e0cf3
postgres`LockAcquireExtended(locktag=0x00007fff5dc88208, lockmode=8,
sessionLock='\x01', dontWait='\0', reportMemoryError='\0') + 2819 at
lock.c:998
frame #10: 0x00000001023da706
postgres`StandbyAcquireAccessExclusiveLock(xid=867, dbOid=16384,
relOid=16385) + 358 at standby.c:627
frame #11: 0x00000001023dac6b
postgres`standby_redo(record=0x00007fde50841e38) + 251 at
standby.c:809

(LOCALLOCK) $1 = {
tag = {
lock = {
locktag_field1 = 16384
locktag_field2 = 16385
locktag_field3 = 0
locktag_field4 = 0
locktag_type = '\0'
locktag_lockmethodid = '\x01'
}
mode = 8
}

=# select relname, oid from pg_class where oid > 16000;
relname | oid
---------+-------
aa | 16385
(1 row)
So recovery is conflicting here. My guess is that this patch is
missing some lock cleanup.

With the test case attached in my case the COMMIT PREPARED record does
not even get replayed.
--
Michael

Attachment Content-Type Size
twophase_recovery_bug_2.pl text/x-perl 1.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Karl O. Pinc 2016-04-07 06:30:30 Re: Patch to implement pg_current_logfile() function
Previous Message Fujii Masao 2016-04-07 06:26:42 Re: Support for N synchronous standby servers - take 2