Re: Improving connection scalability: GetSnapshotData()

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Peter Geoghegan <pg(at)bowt(dot)ie>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Jonathan Katz <jkatz(at)postgresql(dot)org>, Bruce Momjian <bruce(at)momjian(dot)us>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>
Subject: Re: Improving connection scalability: GetSnapshotData()
Date: 2020-04-08 13:24:13
Message-ID: CA+TgmoaC9719CJH2RTAZC9xkebxmbf+zYJo9VgV4GJBwqA5xiw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Apr 7, 2020 at 4:27 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> The main reason is that we want to be able to cheaply check the current
> state of the variables (mostly when checking a backend's own state). We
> can't access the "dense" ones without holding a lock, but we e.g. don't
> want to make ProcArrayEndTransactionInternal() take a lock just to check
> if vacuumFlags is set.
>
> It turns out to also be good for performance to have the copy for
> another reason: The "dense" arrays share cachelines with other
> backends. That's worth it because it allows to make GetSnapshotData(),
> by far the most frequent operation, touch fewer cache lines. But it also
> means that it's more likely that a backend's "dense" array entry isn't
> in a local cpu cache (it'll be pulled out of there when modified in
> another backend). In many cases we don't need the shared entry at commit
> etc time though, we just need to check if it is set - and most of the
> time it won't be. The local entry allows to do that cheaply.
>
> Basically it makes sense to access the PGPROC variable when checking a
> single backend's data, especially when we have to look at the PGPROC for
> other reasons already. It makes sense to look at the "dense" arrays if
> we need to look at many / most entries, because we then benefit from the
> reduced indirection and better cross-process cacheability.

That's a good explanation. I think it should be in the comments or a
README somewhere.

> How about:
> /*
> * If the current xactCompletionCount is still the same as it was at the
> * time the snapshot was built, we can be sure that rebuilding the
> * contents of the snapshot the hard way would result in the same snapshot
> * contents:
> *
> * As explained in transam/README, the set of xids considered running by
> * GetSnapshotData() cannot change while ProcArrayLock is held. Snapshot
> * contents only depend on transactions with xids and xactCompletionCount
> * is incremented whenever a transaction with an xid finishes (while
> * holding ProcArrayLock) exclusively). Thus the xactCompletionCount check
> * ensures we would detect if the snapshot would have changed.
> *
> * As the snapshot contents are the same as it was before, it is is safe
> * to re-enter the snapshot's xmin into the PGPROC array. None of the rows
> * visible under the snapshot could already have been removed (that'd
> * require the set of running transactions to change) and it fulfills the
> * requirement that concurrent GetSnapshotData() calls yield the same
> * xmin.
> */

That's nice and clear.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Steele 2020-04-08 13:25:26 Re: Allow auto_explain to log plans before queries are executed
Previous Message Alexander Korotkov 2020-04-08 12:59:50 Re: Improving connection scalability: GetSnapshotData()