Re: pgsql: Fix race condition in snapshot caching when 2PC is used.

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-committers(at)lists(dot)postgresql(dot)org
Subject: Re: pgsql: Fix race condition in snapshot caching when 2PC is used.
Date: 2020-08-19 00:37:13
Message-ID: 20200819003713.cm2dg7o7knqnxdny@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers

Hi,

On 2020-08-18 19:55:50 -0400, Tom Lane wrote:
> > I'm inclined to just make ClearTransaction take an exclusive lock - the
> > rest of the 2PC operations are so heavyweight that I can't imagine
> > making a difference. When I tested the locking changes in
> > ProcArrayAdd()/Remove() the more heavyweight locking wasn't at all
> > visible.
>
> I was wondering if it'd be sensible to convert that counter into an
> atomic variable. That's not real clear, but worth thinking about.

I had it that way originally, but because xactCompletionCount otherwise
needs to be modified with ProcArrayLock exclusively held, I moved it to
a plain variable. It'd be a bad deal to actually modify it atomically
unless we got something out of that ([1]), because that'd increase the
number of atomic operations for commits, something we surely don't want.

We could make the locking rules be that modifications with a shared
ProcArrayLock have to be done with an atomic op, and operations with
ProcArrayLock held exclusively may do it without an atomic op. But that
seems a bit weird.

A slightly bigger hammer that we could use would be to remove the
somewhat weird split of 2PC commit between ProcArrayClearTransaction()
and ProcArrayAdd(). As far as I can tell we could have
EndPrepare()/MarkAsPrepared() do both operations with one exclusive
ProcArrayLock acquisition.

On balance I'd change it to an exclusive lock with a comment mentioning
that it'd not be too hard to downgrade to a shared lock, should it ever
become a performance issue.

Greetings,

Andres Freund

[1] e.g. not needing to hold ProcArrayLock to check whether a cached
snapshot can be used. The complexity there is that I think it adds a
"race condition" where the global xmin horizon can go backward. It'd be
detected and "reverted", but that still could make the horizon appear to
go backwar for a concurrent horizon determination.

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Andres Freund 2020-08-19 01:22:57 prepared transaction isolation tests
Previous Message Tom Lane 2020-08-18 23:55:50 Re: pgsql: Fix race condition in snapshot caching when 2PC is used.