Re: POC: Cache data in GetSnapshotData()

From: Andres Freund <andres(at)anarazel(dot)de>
To: Mithun Cy <mithun(dot)cy(at)enterprisedb(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: POC: Cache data in GetSnapshotData()
Date: 2017-08-02 20:46:41
Message-ID: 20170802204641.qspe7kiqhpm4uczj@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I think this patch should have a "cover letter" explaining what it's
trying to achieve, how it does so and why it's safe/correct. I think
it'd also be good to try to show some of the worst cases of this patch
(i.e. where the caching only adds overhead, but never gets used), and
some of the best cases (where the cache gets used quite often, and
reduces contention significantly).

I wonder if there's a way to avoid copying the snapshot into the cache
in situations where we're barely ever going to need it. But right now
the only way I can think of is to look at the length of the
ProcArrayLock wait queue and count the exclusive locks therein - which'd
be expensive and intrusive...

On 2017-08-02 15:56:21 +0530, Mithun Cy wrote:
> diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
> index a7e8cf2..4d7debe 100644
> --- a/src/backend/storage/ipc/procarray.c
> +++ b/src/backend/storage/ipc/procarray.c
> @@ -366,11 +366,13 @@ ProcArrayRemove(PGPROC *proc, TransactionId latestXid)
> (arrayP->numProcs - index - 1) * sizeof(int));
> arrayP->pgprocnos[arrayP->numProcs - 1] = -1; /* for debugging */
> arrayP->numProcs--;
> + ProcGlobal->is_snapshot_valid = false;
> LWLockRelease(ProcArrayLock);
> return;
> }
> }
>
> + ProcGlobal->is_snapshot_valid = false;
> /* Oops */
> LWLockRelease(ProcArrayLock);

Why do we need to do anything here? And if we need to, why not in
ProcArrayAdd?

> @@ -1552,6 +1557,8 @@ GetSnapshotData(Snapshot snapshot)
> errmsg("out of memory")));
> }
>
> + snapshot->takenDuringRecovery = RecoveryInProgress();
> +
> /*
> * It is sufficient to get shared lock on ProcArrayLock, even if we are
> * going to set MyPgXact->xmin.
> @@ -1566,100 +1573,200 @@ GetSnapshotData(Snapshot snapshot)
> /* initialize xmin calculation with xmax */
> globalxmin = xmin = xmax;
>
> - snapshot->takenDuringRecovery = RecoveryInProgress();
> -

This movement of code seems fairly unrelated?

> if (!snapshot->takenDuringRecovery)
> {

Do we really need to restrict this to !recovery snapshots? It'd be nicer
if we could generalize it - I don't immediately see anything !recovery
specific in the logic here.

> - int *pgprocnos = arrayP->pgprocnos;
> - int numProcs;
> + if (ProcGlobal->is_snapshot_valid)
> + {
> + Snapshot csnap = &ProcGlobal->cached_snapshot;
> + TransactionId *saved_xip;
> + TransactionId *saved_subxip;
>
> - /*
> - * Spin over procArray checking xid, xmin, and subxids. The goal is
> - * to gather all active xids, find the lowest xmin, and try to record
> - * subxids.
> - */
> - numProcs = arrayP->numProcs;
> - for (index = 0; index < numProcs; index++)
> + saved_xip = snapshot->xip;
> + saved_subxip = snapshot->subxip;
> +
> + memcpy(snapshot, csnap, sizeof(SnapshotData));
> +
> + snapshot->xip = saved_xip;
> + snapshot->subxip = saved_subxip;
> +
> + memcpy(snapshot->xip, csnap->xip,
> + sizeof(TransactionId) * csnap->xcnt);
> + memcpy(snapshot->subxip, csnap->subxip,
> + sizeof(TransactionId) * csnap->subxcnt);
> + globalxmin = ProcGlobal->cached_snapshot_globalxmin;
> + xmin = csnap->xmin;
> +
> + count = csnap->xcnt;
> + subcount = csnap->subxcnt;
> + suboverflowed = csnap->suboverflowed;
> +
> + Assert(TransactionIdIsValid(globalxmin));
> + Assert(TransactionIdIsValid(xmin));
> + }

Can we move this to a separate function? In fact, could you check how
much effort it'd be to split cached, !recovery, recovery cases into
three static inline helper functions? This is starting to be hard to
read, the indentation added in this patch doesn't help... Consider doing
recovery, !recovery cases in a preliminary separate patch.

> + * Let only one of the caller cache the computed snapshot, and
> + * others can continue as before.
> */
> - if (!suboverflowed)
> + if (!ProcGlobal->is_snapshot_valid &&
> + LWLockConditionalAcquire(&ProcGlobal->CachedSnapshotLock,
> + LW_EXCLUSIVE))
> {

I'd also move this to a helper function (the bit inside the lock).

> +
> + /*
> + * The memory barrier has be to be placed here to ensure that
> + * is_snapshot_valid is set only after snapshot is cached.
> + */
> + pg_write_barrier();
> + ProcGlobal->is_snapshot_valid = true;
> + LWLockRelease(&ProcGlobal->CachedSnapshotLock);

LWLockRelease() is a full memory barrier, so this shouldn't be required.

> /*
> diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
> index 7dbaa81..7d06904 100644
> --- a/src/include/storage/proc.h
> +++ b/src/include/storage/proc.h
> @@ -16,6 +16,7 @@
>
> #include "access/xlogdefs.h"
> #include "lib/ilist.h"
> +#include "utils/snapshot.h"
> #include "storage/latch.h"
> #include "storage/lock.h"
> #include "storage/pg_sema.h"
> @@ -253,6 +254,22 @@ typedef struct PROC_HDR
> int startupProcPid;
> /* Buffer id of the buffer that Startup process waits for pin on, or -1 */
> int startupBufferPinWaitBufId;
> +
> + /*
> + * In GetSnapshotData we can reuse the previously snapshot computed if no
> + * new transaction has committed or rolledback. Thus saving good amount of
> + * computation cycle under GetSnapshotData where we need to iterate
> + * through procArray every time we try to get the current snapshot. Below
> + * members help us to save the previously computed snapshot in global
> + * shared memory and any process which want to get current snapshot can
> + * directly copy from them if it is still valid.
> + */
> + SnapshotData cached_snapshot; /* Previously saved snapshot */
> + volatile bool is_snapshot_valid; /* is above snapshot valid */
> + TransactionId cached_snapshot_globalxmin; /* globalxmin when above

I'd rename is_snapshot_valid to 'cached_snapshot_valid' or such, it's
not clear from the name what it refers to.

> + LWLock CachedSnapshotLock; /* A try lock to make sure only one
> + * process write to cached_snapshot */
> } PROC_HDR;
>

Hm, I'd not add the lock to the struct, we normally don't do that for
the "in core" lwlocks that don't exist in a "configurable" number like
the buffer locks and such.

Regards,

Andres

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-08-02 20:52:01 Re: Why does logical replication launcher exit with exit code 1?
Previous Message Robert Haas 2017-08-02 20:40:38 Re: Partition-wise join for join between (declaratively) partitioned tables