Re: Re: How to check whether the row was modified by this transaction before?

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Vlad Arkhipov <arhipov(at)dc(dot)baikal(dot)ru>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: How to check whether the row was modified by this transaction before?
Date: 2012-12-06 22:47:05
Message-ID: 20121206224705.GA6515@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2012-12-06 16:26:50 -0500, Tom Lane wrote:
> Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> > On 2012-12-06 13:59:32 -0500, Tom Lane wrote:
> >> Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> >>> On 2012-12-06 15:08:51 -0300, Alvaro Herrera wrote:
> >>> Vlad Arkhipov <arhipov(at)dc(dot)baikal(dot)ru> writes:
> >>>> In a BEFORE UPDATE trigger I need to know whether the row was previously
> >>>> modified by this transaction. Is it safe to use xmin and txid_current()
> >>>> for this purpose (xmin is 32-bit txid type but txid_current() returns
> >>>> 64-bit bigint).
>
> >>> I wonder if we shouldn't have a function txid_is_current(xid);
>
> > I think architectures with an invalidation-queue to external caches (be
> > it web-proxies or something lower-level) are quite popular. And with the
> > new NOTIFY or pgq relatively simple. Ad to those its sensible not to
> > post a single primary key more than once.
>
> It strikes me that the notion of "this row was previously modified by
> the current transaction" is squishier than it might look, and we'd do
> well to clarify it before we consider exporting anything.

You're right.

I am also wondering if we can assume that for all interesting purposes
enough context is available to determine if the row was just inserted or
updated or whether we need to make a test for HEAP_UPDATED available in
some form.

> I think there
> are three ways you might define such a function:
>
> 1. xmin is exactly equal to current (sub)transaction's XID.
>
> 2. xmin is this (sub)transaction's XID, or the XID of any subcommitted
> subtransaction of it.
>
> 3. xmin is this (sub)transaction's XID, or the XID of any subcommitted
> subtransaction, or the XID of any open parent transaction or
> subcommitted subtransaction thereof.
>
> If I've got my head screwed on straight, test #2 gives you the semantics
> that "the previous row update cannot commit unless the action you are
> about to take (with the current XID) commits". Test #3 gives you the
> semantics "the action you are about to take cannot commit unless the
> previous row update does". And test #1 doesn't have much to recommend
> it except simplicity; while it might appear to have the semantics "the
> previous row update will commit if and only if the action you are about
> to take commits", it's actually narrower than that, because the same
> could be said for already-subcommitted subtransactions.
>
> In a cache invalidation context you probably want test #2, but
> TransactionIdIsCurrentTransactionId() presently performs test #3.

I agree that 1) isn't all that interesting but after that it really gets
a bit harder...

2) would be useful to do stuff like avoiding to queue any invalidations
if the table is newly truncated, but that seems like a fragile thing to
do from userspace. A table rewrite seems to be something fundamentally
different from a truncation here because that would need more detail
that txid_is_current(xid) would give you.
Maybe something like 'pg_relation_is_new()' would be interesting, but
that seems like a different thing, and I am not sure the use-case for
that is clear enough.
From a userlevel perspective 3) seems to be enough if all you want to
test whether the relation is new, but again, you couldn't test for that
sensibly because there's no access to HEAP_UPDATED.

For which cases do you think 2) is interesting wrt. cache invalidations?

I think 3) might be more interesting for the (for me) common case where
you just want to avoid queuing duplicate invalidations. Because all
youre interested in that case is: "Can I be sure that if I commit
another invalidation has already been queued for this tuple.". And 3)
seems to gives you that.

> (We discussed this point in connection with commit 7b90469b, and I've
> been intending since then to take a closer look at all the callers of
> TransactionIdIsCurrentTransactionId to see if these semantics are in
> fact what they need. We might have some bugs associated with confusion
> on this.)

Its an easy error to make I think, so I wouldn't be too surprised
somebody made it before me. On the other hand, it seems to me that to be
dangerous you need a part of the system thats not obeying transactional
semantics like the indexes which still contain enum oids that aren't in
the catalogs anymore. So I hope there aren't too many places where
people could have made that mistake.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2012-12-06 23:28:38 -DCLOBBER_CACHE_ALWAYS shows COPY FREEZE regression problem
Previous Message MauMau 2012-12-06 22:45:14 Re: Serious problem: media recovery fails after system or PostgreSQL crash