Re: Remove HeapTuple and Buffer dependency for predicate locking functions

From: Ashwin Agrawal <aagrawal(at)pivotal(dot)io>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>
Subject: Re: Remove HeapTuple and Buffer dependency for predicate locking functions
Date: 2019-08-02 23:56:22
Message-ID: CALfoeivQ6xJdwmgSfcP+_NjTh8yjgjoHqrRqpbXxG_fRaytU4Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 31, 2019 at 2:06 PM Andres Freund <andres(at)anarazel(dot)de> wrote:

> Looking at the code as of master, we currently have:
>

Super awesome feedback and insights, thank you!

- PredicateLockTuple() calls SubTransGetTopmostTransaction() to figure
> out a whether the tuple has been locked by the current
> transaction. That check afaict just should be
> TransactionIdIsCurrentTransactionId(), without all the other
> stuff that's done today.
>

Agree. v1-0002 patch attached does that now. Please let me know if that's
what you meant.

TransactionIdIsCurrentTransactionId() imo ought to be optimized to
> always check for the top level transactionid first - that's a good bet
> today, but even moreso for the upcoming AMs that won't have separate
> xids for subtransactions. Alternatively we shouldn't make that a
> binary search for each subtrans level, but just have a small
> simplehash hashtable for xids.
>

v1-0001 patch checks for GetTopTransactionIdIfAny() first in
TransactionIdIsCurrentTransactionId() which seems yes better in general and
more for future. That mostly meets the needs for current discussion.

The alternative of not using binary search seems bigger refactoring and
should be handled as separate optimization exercise outside of this thread.

> - CheckForSerializableConflictOut() wants to get the toplevel xid for
> the tuple, because that's the one the predicate hashtable stores.
>
> In your patch you've already moved the HTSV() call etc out of
> CheckForSerializableConflictOut(). I'm somewhat inclined to think that
> the SubTransGetTopmostTransaction() call ought to go along with that.
> I don't really think that belongs in predicate.c, especially if
> most/all new AMs don't use subtransaction ids.
>
> The only downside is that currently the
> TransactionIdEquals(xid, GetTopTransactionIdIfAny()) check
> avoids the SubTransGetTopmostTransaction() check.
>
> But again, the better fix for that seems to be to improve the generic
> code. As written the check won't prevent a subtrans lookup for heap
> when subtransactions are in use, and it's IME pretty common for tuples
> to get looked at again in the transaction that has created them. So
> I'm somewhat inclined to think that SubTransGetTopmostTransaction()
> should have a fast-path for the current transaction - probably just
> employing TransactionIdIsCurrentTransactionId().
>

That optimization, as Kuntal also mentioned, seems something which can be
done on-top afterwards on current patch.

> I don't really see what we gain by having the subtrans handling in the
> predicate code. Especially given that we've already moved the HTSV()
> handling out, it seems architecturally the wrong place to me - but I
> admit that that's a fuzzy argument. The relevant mapping should be one
> line in the caller.
>

Okay, I moved the sub transaction handling out of
CheckForSerializableConflictOut() and have it along side HTSV() now.

The reason I felt leaving subtransaction handling in generic place, was it
might be premature to thing no future AM will need it. Plus, all
serializable function api's having same expectations is easier. Like
PredicateLockTuple() can be passed top or subtransaction id and it can
handle it but with the change CheckForSerializableConflictOut() only be
feed top transaction ID. But its fine and can see the point of AM needing
it can easily get top transaction ID and feed it as heap.

> I wonder if it'd be wroth to combine the
> TransactionIdIsCurrentTransactionId() calls in the heap cases that
> currently do both, PredicateLockTuple() and
> HeapCheckForSerializableConflictOut(). The heap_fetch() case probably
> isn't commonly that hot a pathq, but heap_hot_search_buffer() is.
>

Maybe, will give thought to it separate from the current patch.

> Minor notes:
> - I don't think 'insert_xid' is necessarily great - it could also be the
> updating xid etc. And while you can argue that an update is an insert
> in the current heap, that's not the case for future AMs.
> - to me
> @@ -1621,7 +1622,8 @@ heap_hot_search_buffer(ItemPointer tid, Relation
> relation, Buffer buffer,
> if (valid)
> {
> ItemPointerSetOffsetNumber(tid, offnum);
> - PredicateLockTuple(relation, heapTuple,
> snapshot);
> + PredicateLockTID(relation,
> &(heapTuple)->t_self, snapshot,
> +
> HeapTupleHeaderGetXmin(heapTuple->t_data));
> if (all_dead)
> *all_dead = false;
> return true;
>
> What are those parens - as placed they can't do anything. Did you
> intend to write &(heapTuple->t_self)? Even that is pretty superfluous,
> but it at least clarifies the precedence.
>

Fixed. No idea what I was thinking there, mostly feel I intended to have it
as like &(heapTuple->t_self).

I'm also a bit confused why we don't need to pass in the offset of the
> current tuple, rather than the HOT root tuple here. That's not related
> to this patch. But aren't we locking the wrong tuple here, in case of
> HOT?
>

Yes, root is being locked here instead of the HOT. But I don't have full
context on the same. If we wish to fix it though, can be easily done now
with the patch by passing "tid" instead of &(heapTuple->t_self).

- I wonder if CheckForSerializableConflictOutNeeded() shouldn't have a
> portion of it's code as a static inline. In particular, it's a shame
> that we currently perform external function calls at quite the
> frequency when serializable isn't even in use.
>

I am not sure on portion of the code part? SerializationNeededForRead() is
static inline function in C file. Can't inline
CheckForSerializableConflictOutNeeded() without moving
SerializationNeededForRead() and some other variables to header file.
CheckForSerializableConflictOut() wasn't inline either, so a function call
was performed earlier as well when serializable isn't even in use.

I understand that with refactor, HeapCheckForSerializableConflictOut() is
called which calls CheckForSerializableConflictOutNeeded(). If that's the
problem, for addressing the same, I had proposed alternative way to
refactor. CheckForSerializableConflictOut() can take callback function and
void* callback argument for AM specific check instead. So, the flow would
be AM calling CheckForSerializableConflictOut() as today and only if
serializable in use will invoke the callback to check with AM if more work
should be performed or not. Essentially
HeapCheckForSerializableConflictOut() will become callback function
instead. Due to void* callback argument aspect I didn't like that solution
and felt AM performing checks and calling CheckForSerializableConflictOut()
seems more straight forward.

Attachment Content-Type Size
v1-0001-Optimize-TransactionIdIsCurrentTransactionId.patch text/x-patch 1.1 KB
v1-0002-Optimize-PredicateLockTuple.patch text/x-patch 1.7 KB
v1-0003-Remove-HeapTuple-dependency-for-predicate-locking.patch text/x-patch 25.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2019-08-03 00:13:49 Re: [PATCH] Stop ALTER SYSTEM from making bad assumptions
Previous Message Ian Barwick 2019-08-02 23:36:13 Re: [PATCH] Stop ALTER SYSTEM from making bad assumptions