Re: Speedup twophase transactions

From: Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: David Steele <david(at)pgmasters(dot)net>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Speedup twophase transactions
Date: 2016-04-01 13:53:28
Message-ID: AC257D7F-E173-48B2-ADE4-4A674C78036B@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Apr 1, 2016, at 10:04 AM, Michael Paquier <michael(dot)paquier(at)gmail(dot)com> wrote:
>
> I would suggest the following name modifications, node names have been
> introduced to help tracking of each node's log:
> - Candie => master
> - Django => slave or just standby
> There is no need for complication! And each node's log file is
> prefixed by the test number. Note that in other tests there are no
> promotions, or fallbacks done, but we stick with a node name that
> represents the initial state of the cluster.

Ok, let’s reflect initial state in node names. So master and standby then.

>
>> +# Switch to synchronous replication
>> +$node_master->append_conf('postgresql.conf', qq(
>> +synchronous_standby_names = '*'
>> +));
>> +$node_master->restart;
>> Reloading would be fine.
>>
>> Good catch, done.
>
> +$node_master->psql('postgres', "select pg_reload_conf()");
>
> It would be cleaner to introduce a new routine in PostgresNode that
> calls pg_ctl reload (mentioned in the N-sync patch as well, that would
> be useful for many purposes).

Okay.

>
>> All accesses to 2pc dummies in ProcArray are covered with TwoPhaseStateLock,
>> so I thick that’s safe. Also I’ve deleted comment above that block, probably
>> it’s more confusing than descriptive.
>
> OK, you removed the use to allProcs. Though by reading again the code
> just holding TwoPhaseStateLock that's actually fine.
>
> The patch needs a small cleanup:
> $ git diff master --check
> src/test/recovery/t/006_twophase.pl:224: new blank line at EOF.
>
> 006_twophase.pl should be renamed to 007. It keeps its license to
> kill, and gains in being famous.

Huh, eventually there will be Fleming reference, instead of Tarantino one in node names.

>
> - * Scan the pg_twophase directory and setup all the required information to
> - * allow standby queries to treat prepared transactions as still active.
> - * This is never called at the end of recovery - we use
> - * RecoverPreparedTransactions() at that point.
> + * It's a caller responsibility to call MarkAsPrepared() on returned gxact.
> *
> Wouldn't it be more simple to just call MarkAsPrepared at the end of
> RecoverPreparedFromBuffer?

I did that intentionally to allow modification of gxact before unlock.

RecoverPreparedFromXLOG:
gxact = RecoverPreparedFromBuffer((char *) XLogRecGetData(record), false);
gxact->prepare_start_lsn = record->ReadRecPtr;
gxact->prepare_end_lsn = record->EndRecPtr;
MarkAsPrepared(gxact);

RecoverPreparedFromFiles:
gxact = RecoverPreparedFromBuffer(buf, forceOverwriteOK);
gxact->ondisk = true;
MarkAsPrepared(gxact);

While both function are only called during recovery, I think that it is better to write that
in a way when it possible to use it in multiprocess environment.

>
> While testing the patch, I found a bug in the recovery conflict code
> path. You can do the following to reproduce it:
> 1) Start a master with a standby
> 2) prepare a transaction on master
> 3) Stop immediate on standby to force replay
> 4) commit prepare transaction on master
> 5) When starting the standby, it remains stuck here:

Hm, I wasn’t able to reproduce that. Do you mean following scenario or am I missing something?

(async replication)

$node_master->psql('postgres', "
begin;
insert into t values (1);
prepare transaction 'x';
");
$node_slave->teardown_node;
$node_master->psql('postgres',"commit prepared 'x'");
$node_slave->start;
$node_slave->psql('postgres',"select count(*) from pg_prepared_xacts", stdout => \$psql_out);
is($psql_out, '0', "Commit prepared on master while slave is down.");

> * thread #1: tid = 0x229b4, 0x00007fff8e2d4f96
> libsystem_kernel.dylib`poll + 10, queue = 'com.apple.main-thread',
> stop reason = signal SIGSTOP
> * frame #0: 0x00007fff8e2d4f96 libsystem_kernel.dylib`poll + 10
> frame #1: 0x0000000107e5e043
> postgres`WaitEventSetWaitBlock(set=0x00007f90c20596a8, cur_timeout=-1,
> occurred_events=0x00007fff581efd28, nevents=1) + 51 at latch.c:1102
> frame #2: 0x0000000107e5da26
> postgres`WaitEventSetWait(set=0x00007f90c20596a8, timeout=-1,
> occurred_events=0x00007fff581efd28, nevents=1) + 390 at latch.c:935
> frame #3: 0x0000000107e5d4c7
> postgres`WaitLatchOrSocket(latch=0x0000000111432464, wakeEvents=1,
> sock=-1, timeout=-1) + 343 at latch.c:347
> frame #4: 0x0000000107e5d36a
> postgres`WaitLatch(latch=0x0000000111432464, wakeEvents=1, timeout=0)
> + 42 at latch.c:302
> frame #5: 0x0000000107e7b5a6 postgres`ProcWaitForSignal + 38 at proc.c:1731
> frame #6: 0x0000000107e6a4eb
> postgres`ResolveRecoveryConflictWithLock(locktag=LOCKTAG at
> 0x00007fff581efde8) + 187 at standby.c:391
> frame #7: 0x0000000107e7a6a8
> postgres`ProcSleep(locallock=0x00007f90c203dac8,
> lockMethodTable=0x00000001082f6218) + 1128 at proc.c:1215
> frame #8: 0x0000000107e72886
> postgres`WaitOnLock(locallock=0x00007f90c203dac8,
> owner=0x0000000000000000) + 358 at lock.c:1703
> frame #9: 0x0000000107e70f93
> postgres`LockAcquireExtended(locktag=0x00007fff581f0238, lockmode=8,
> sessionLock='\x01', dontWait='\0', reportMemoryError='\0') + 2819 at
> lock.c:998
> frame #10: 0x0000000107e6a9a6
> postgres`StandbyAcquireAccessExclusiveLock(xid=863, dbOid=16384,
> relOid=16385) + 358 at standby.c:627
> frame #11: 0x0000000107e6af0b
> postgres`standby_redo(record=0x00007f90c2041e38) + 251 at
> standby.c:809
> frame #12: 0x0000000107b0e227 postgres`StartupXLOG + 9351 at xlog.c:6871
> It seems that the replay on on-memory state of the PREPARE transaction
> is conflicting directly with replay.

--
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-04-01 14:15:45 Re: SSL indicator in psql prompt
Previous Message Aleksander Alekseev 2016-04-01 13:52:15 Re: WIP: Access method extendability