Re: Speedup twophase transactions

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Nikhil Sontakke <nikhils(at)2ndquadrant(dot)com>
Cc: David Steele <david(at)pgmasters(dot)net>, Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Speedup twophase transactions
Date: 2017-03-13 06:44:53
Message-ID: CAB7nPqSjrtRQMYnFKXCnjYcO=ni2VH1C1hDcuOo+_O2qNkmtNw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Mar 11, 2017 at 7:26 PM, Nikhil Sontakke
<nikhils(at)2ndquadrant(dot)com> wrote:
> Hi David and Michael,
>> It would be great to get this thread closed out after 14 months and many
>> commits.
>
> PFA, latest patch which addresses Michael's comments.

Thanks for the new version. Let's head toward a final patch.

>> + ereport(WARNING,
>> + (errmsg("removing future two-phase state data from
>> memory \"%u\"",
>> + xid)));
>> + PrepareRedoRemove(xid);
>> + continue
>> Those are not normal (partially because unlink is atomic, but not
>> durable)... But they match the correct coding pattern regarding
>> incorrect 2PC entries... I'd really like to see those switched to a
>> FATAL with unlink() made durable for those calls.
>
> Hmm, not sure what exactly we need to do here. If you look at the prior
> checks, there we already skip on-disk entries. So, typically, the entries
> that we encounter here will be in shmem only.

As long as we don't have an alternative to offer a durable unlink,
let's do nothing then. This is as well consistent with the other code
paths handling corrupted or incorrect 2PC entries.

>> + /* Deconstruct header */
>> + hdr = (TwoPhaseFileHeader *) buf;
>> + Assert(TransactionIdEquals(hdr->xid, xid));
>> +
>> + if (TransactionIdPrecedes(xid, result))
>> + result = xid;
>> This portion is repeated three times and could be easily refactored.
>> You could just have a routine that returns the oldes transaction ID
>> used, and ignore the result for StandbyRecoverPreparedTransactions()
>> by casting a (void) to let any kind of static analyzer understand that
>> we don't care about the result for example. Handling for subxids is
>> necessary as well depending on the code path. Spliting things into a
>> could of sub-routines may be more readable as well. There are really a
>> couple of small parts that can be gathered and strengthened.
>
> Have added a new function to do this now. It reads either from disk or
> shared memory and produces error/log messages accordingly as well now.

And that's ProcessTwoPhaseBufferAndReturn in the patch.
ProcessTwoPhaseBuffer may be a better name.

>> + /*
>> + * Recreate its GXACT and dummy PGPROC
>> + */
>> + gxactnew = MarkAsPreparing(xid, gid,
>> + hdr->prepared_at,
>> + hdr->owner, hdr->database,
>> + gxact->prepare_start_lsn,
>> + gxact->prepare_end_lsn);
>> MarkAsPreparing() does not need to be extended with two new arguments.
>> RecoverPreparedTransactions() is used only at the end of recovery,
>> where it is not necessary to look at the 2PC state files from the
>> records. In this code path inredo is also set to false :)
>
> That's not true. We will have entries with inredo set at the end of recovery
> as well. Infact the MarkAsPreparing() call from
> RecoverPreparedTransactions() is the one which will remove these inredo
> entries and convert them into regular entries. We have optimized the
> recovery code path as well.
>
>> + /* Delete TwoPhaseState gxact entry and/or 2PC file. */
>> + PrepareRedoRemove(parsed.twophase_xid);
>> Both things should not be present, no? If the file is pushed to disk
>> it means that the checkpoint horizon has already moved.
>>
> PREPARE in redo, followed by a checkpoint, followed by a COMMIT/ROLLBACK. We
> can have both the bits set in this case.

Oh, I see where our thoughts don't overlap. I actually thought that
the shared memory entry and the on-disk file cannot co-exist (or if
you want a file flushed at checkpoint should have its shmem entry
removed). But you are right and I am wrong. In order to have the error
handling done properly if the maximum amount of 2PC transactions is
reached. Still....

>> - ereport(ERROR,
>> + /* It's ok to find an entry in the redo/recovery case */
>> + if (!gxact->inredo)
>> + ereport(ERROR,
>> (errcode(ERRCODE_DUPLICATE_OBJECT),
>> errmsg("transaction identifier \"%s\" is already in
>> use",
>> gid)));
>> + else
>> + {
>> + found = true;
>> + break;
>> + }
>> I would not have thought so.
>
> Since we are using the TwoPhaseState structure to track redo entries, at end
> of recovery, we will find existing entries. Please see my comments above for
> RecoverPreparedTransactions()

This is absolutely not good, because it is a direct result of the
interactions of the first loop of RecoverPreparedTransaction() with
its second loop, and because MarkAsPreparing() can finished by being
called *twice* from the same transaction. I really think that this
portion should be removed and that RecoverPreparedTransactions()
should be more careful when scanning the entries in pg_twophase by
looking up at what exists as well in shared memory, instead of doing
that in MarkAsPreparing().

Here are some more comments:

+ /*
+ * Recreate its GXACT and dummy PGPROC
+ */
+ gxactnew = MarkAsPreparing(xid, gid,
+ hdr->prepared_at,
+ hdr->owner, hdr->database,
+ gxact->prepare_start_lsn,
+ gxact->prepare_end_lsn);
+
+ Assert(gxactnew == gxact);
Here it would be better to set ondisk to false. This makes the code
more consistent with the previous loop, and the intention clear.

The first loop of RecoverPreparedTransactions() has a lot in common
with its second loop. You may want to refactor a little bit more here.

+/*
+ * PrepareRedoRemove
+ *
+ * Remove the corresponding gxact entry from TwoPhaseState. Also
+ * remove the 2PC file.
+ */
This could be a bit more expanded. The removal of the 2PC does not
happen after removing the in-memory data, it would happen if the
in-memory data is not found.

+MarkAsPreparingInRedo(TransactionId xid, const char *gid,
+ TimestampTz prepared_at, Oid owner, Oid databaseid,
+ XLogRecPtr prepare_start_lsn, XLogRecPtr prepare_end_lsn)
+{
+ GlobalTransaction gxact;
+
+ LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE);
MarkAsPreparingInRedo is internal to twophase.c. There is no need to
expose it externally and it is just used in PrepareRedoAdd so you
could just group both.

bool valid; /* TRUE if PGPROC entry is in proc array */
bool ondisk; /* TRUE if prepare state file is on disk */
+ bool inredo; /* TRUE if entry was added via xlog_redo */
We could have a set of flags here, that's the 3rd boolean of the
structure used for a status.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2017-03-13 07:11:47 Re: Changing references of password encryption to hashing
Previous Message Pavel Stehule 2017-03-13 06:31:51 Re: SQL/JSON in PostgreSQL