Re: Race condition in GetOldestActiveTransactionId()

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Race condition in GetOldestActiveTransactionId()
Date: 2017-07-13 12:52:44
Message-ID: e4c4e884-0c00-2f44-2a20-1b29a031349d@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 08/22/2016 01:46 PM, Heikki Linnakangas wrote:
> While hacking on the CSN patch, I spotted a race condition between
> GetOldestActiveTransactionId() and GetNewTransactionId().
> GetOldestActiveTransactionId() calculates the oldest XID that's still
> running, by doing:
>
> 1. Read nextXid, without a lock. This is the upper bound, if there are
> no active XIDs in the proc array.
> 2. Loop through the proc array, making note of the oldest XID.
>
> While GetNewTransactionId() does:
>
> 1. Read and Increment nextXid
> 2. Set MyPgXact->xid.
>
> It seems possible that if you call GetNewTransactionId() concurrently
> with GetOldestActiveTransactionId(), GetOldestActiveTransactionId() sees
> the new nextXid value that the concurrent GetNewTransactionId() set, but
> doesn't see the old XID in the proc array. It will return a value that
> doesn't cover the old XID, i.e. it won't consider the just-assigned XID
> as in-progress.
>
> Am I missing something? Commit c6d76d7c added a comment to
> GetOldestActiveTransactionId() explaining why it's not necessary to
> acquire XidGenLock there, but I think it missed the above race condition.
>
> GetOldestActiveTransactionId() is not performance-critical, it's only
> called when performing a checkpoint, so I think we should just bite the
> bullet and grab the lock. Per attached patch.

I did some testing, and was able to indeed construct a case where
oldestActiveXid was off-by-one in an online checkpoint record. However,
looking at how it's used, I think it happened to not have any
user-visible effect. The oldestActiveXid value of an online checkpoint
is only used to determine the point where pg_subtrans is initialized,
and the XID missed in the computation could not be a subtransaction.

Nevertheless, it's clearly a bug and the fix is simple, so I committed a
fix.

- Heikki

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Marina Polyakova 2017-07-13 14:34:14 Re: WIP Patch: Pgbench Serialization and deadlock errors
Previous Message Andrew Dunstan 2017-07-13 12:34:28 Re: pl/perl extension fails on Windows