Re: Speedup twophase transactions

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Nikhil Sontakke <nikhils(at)2ndquadrant(dot)com>
Cc: David Steele <david(at)pgmasters(dot)net>, Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Speedup twophase transactions
Date: 2017-03-17 05:14:50
Message-ID: CAB7nPqTUx8quw_J+T_jfxhruTEEZCOd__vMyAUkjd6eMT4oC9g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 16, 2017 at 9:25 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> On Thu, Mar 16, 2017 at 7:18 PM, Nikhil Sontakke
> <nikhils(at)2ndquadrant(dot)com> wrote:
>>> + * * RecoverPreparedTransactions(),
>>> StandbyRecoverPreparedTransactions()
>>> + * and PrescanPreparedTransactions() have been modified to go
>>> throug
>>> + * gxact->inredo entries that have not made to disk yet.
>>>
>>> It seems to me that there should be an initial scan of pg_twophase at
>>> the beginning of recovery, discarding on the way with a WARNING
>>> entries that are older than the checkpoint redo horizon. This should
>>> fill in shmem entries using something close to PrepareRedoAdd(), and
>>> mark those entries as inredo. Then, at the end of recovery,
>>> PrescanPreparedTransactions does not need to look at the entries in
>>> pg_twophase. And that's the case as well of
>>> RecoverPreparedTransaction(). I think that you could get the patch
>>> much simplified this way, as any 2PC data can be fetched directly from
>>> WAL segments and there is no need to rely on scans of pg_twophase,
>>> this is replaced by scans of entries in TwoPhaseState.
>>>
>>
>> I don't think this will work. We cannot replace pg_twophase with shmem
>> entries + WAL pointers. This is because we cannot expect to have WAL entries
>> around for long running prepared queries which survive across checkpoints.
>
> But at the beginning of recovery, we can mark such entries with ondisk
> and inredo, in which case the WAL pointers stored in the shmem entries
> do not matter because the data is already on disk.

Nikhil, do you mind if I try something like that? As we already know
what is the first XID when beginning redo via
ShmemVariableCache->nextXid it is possible to discard 2PC files that
should not be here. What makes me worry is the control of the maximum
number of entries in shared memory. If there are legit 2PC files that
are flushed on disk at checkpoint, you would finish with potentially
more 2PC transactions than what should be possible (even if updates of
max_prepared_xacts are WAL-logged).
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2017-03-17 05:19:31 Re: Radix tree for character conversion
Previous Message Tsunakawa, Takayuki 2017-03-17 05:01:56 Re: PATCH: Make pg_stop_backup() archive wait optional