Re: Remove HeapTuple and Buffer dependency for predicate locking functions

From: Andres Freund <andres(at)anarazel(dot)de>
To: Ashwin Agrawal <aagrawal(at)pivotal(dot)io>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>
Subject: Re: Remove HeapTuple and Buffer dependency for predicate locking functions
Date: 2019-07-31 21:06:30
Message-ID: 20190731210630.nqhszuktygwftjty@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2019-07-31 12:37:58 -0700, Ashwin Agrawal wrote:
> On Wed, Jul 31, 2019 at 10:55 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> > Hi,
> >
> > On 2019-07-31 10:42:50 -0700, Ashwin Agrawal wrote:
> > > On Tue, Jul 30, 2019 at 2:58 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
> > wrote:
> > >
> > > > On Tue, Jun 25, 2019 at 6:02 AM Andres Freund <andres(at)anarazel(dot)de>
> > wrote:
> > > > > > - CheckForSerializableConflictOut() no more takes HeapTuple nor
> > > > > > buffer, instead just takes xid. Push heap specific parts from
> > > > > > CheckForSerializableConflictOut() into its own function
> > > > > > HeapCheckForSerializableConflictOut() which calls
> > > > > > CheckForSerializableConflictOut(). The alternative option could
> > be
> > > > > > CheckForSerializableConflictOut() take callback function and
> > > > > > callback arguments, which gets called if required after
> > performing
> > > > > > prechecks. Though currently I fell AM having its own wrapper to
> > > > > > perform AM specific task and then calling
> > > > > > CheckForSerializableConflictOut() is fine.
> > > > >
> > > > > I think it's right to move the xid handling out of
> > > > > CheckForSerializableConflictOut(). But I think we also ought to move
> > the
> > > > > subtransaction handling out of the function - e.g. zheap doesn't
> > > > > want/need that.
> > > >
> > > > Thoughts on this Ashwin?
> > > >
> > >
> > > I think the only part its doing for sub-transaction is
> > > SubTransGetTopmostTransaction(xid). If xid passed to this function is
> > > already top most transaction which is case for zheap and zedstore, then
> > > there is no downside to keeping that code here in common place.
> >
> > Well, it's far from a cheap function. It'll do unnecessary on-disk
> > lookups in many cases. I'd call that quite a downside.
> >
>
> Okay, agree, its costly function and better to avoid the call if possible.
>
> Instead of moving the handling out of the function, how do feel about
> adding boolean isTopTransactionId argument to function
> CheckForSerializableConflictOut(). The AMs, which implicitly know, only
> pass top transaction Id to this function, can pass true and avoid the
> function call to SubTransGetTopmostTransaction(xid). With this
> subtransaction code remains in generic place and AMs intending to use it
> continue to leverage the common code, plus explicitly clarifies the
> behavior as well.

Looking at the code as of master, we currently have:

- PredicateLockTuple() calls SubTransGetTopmostTransaction() to figure
out a whether the tuple has been locked by the current
transaction. That check afaict just should be
TransactionIdIsCurrentTransactionId(), without all the other
stuff that's done today.

TransactionIdIsCurrentTransactionId() imo ought to be optimized to
always check for the top level transactionid first - that's a good bet
today, but even moreso for the upcoming AMs that won't have separate
xids for subtransactions. Alternatively we shouldn't make that a
binary search for each subtrans level, but just have a small
simplehash hashtable for xids.

- CheckForSerializableConflictOut() wants to get the toplevel xid for
the tuple, because that's the one the predicate hashtable stores.

In your patch you've already moved the HTSV() call etc out of
CheckForSerializableConflictOut(). I'm somewhat inclined to think that
the SubTransGetTopmostTransaction() call ought to go along with that.
I don't really think that belongs in predicate.c, especially if
most/all new AMs don't use subtransaction ids.

The only downside is that currently the
TransactionIdEquals(xid, GetTopTransactionIdIfAny()) check
avoids the SubTransGetTopmostTransaction() check.

But again, the better fix for that seems to be to improve the generic
code. As written the check won't prevent a subtrans lookup for heap
when subtransactions are in use, and it's IME pretty common for tuples
to get looked at again in the transaction that has created them. So
I'm somewhat inclined to think that SubTransGetTopmostTransaction()
should have a fast-path for the current transaction - probably just
employing TransactionIdIsCurrentTransactionId().

I don't really see what we gain by having the subtrans handling in the
predicate code. Especially given that we've already moved the HTSV()
handling out, it seems architecturally the wrong place to me - but I
admit that that's a fuzzy argument. The relevant mapping should be one
line in the caller.

I wonder if it'd be wroth to combine the
TransactionIdIsCurrentTransactionId() calls in the heap cases that
currently do both, PredicateLockTuple() and
HeapCheckForSerializableConflictOut(). The heap_fetch() case probably
isn't commonly that hot a pathq, but heap_hot_search_buffer() is.

Minor notes:
- I don't think 'insert_xid' is necessarily great - it could also be the
updating xid etc. And while you can argue that an update is an insert
in the current heap, that's not the case for future AMs.
- to me
@@ -1621,7 +1622,8 @@ heap_hot_search_buffer(ItemPointer tid, Relation relation, Buffer buffer,
if (valid)
{
ItemPointerSetOffsetNumber(tid, offnum);
- PredicateLockTuple(relation, heapTuple, snapshot);
+ PredicateLockTID(relation, &(heapTuple)->t_self, snapshot,
+ HeapTupleHeaderGetXmin(heapTuple->t_data));
if (all_dead)
*all_dead = false;
return true;

What are those parens - as placed they can't do anything. Did you
intend to write &(heapTuple->t_self)? Even that is pretty superfluous,
but it at least clarifies the precedence.

I'm also a bit confused why we don't need to pass in the offset of the
current tuple, rather than the HOT root tuple here. That's not related
to this patch. But aren't we locking the wrong tuple here, in case of
HOT?

- I wonder if CheckForSerializableConflictOutNeeded() shouldn't have a
portion of it's code as a static inline. In particular, it's a shame
that we currently perform external function calls at quite the
frequency when serializable isn't even in use.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-07-31 21:11:54 Re: pgbench - implement strict TPC-B benchmark
Previous Message Ashwin Agrawal 2019-07-31 20:59:24 Re: Remove HeapTuple and Buffer dependency for predicate locking functions