Re: Logical decoding on standby

From: Andres Freund <andres(at)anarazel(dot)de>
To: Simon Riggs <simon(dot)riggs(at)2ndquadrant(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, Thom Brown <thom(at)linux(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Logical decoding on standby
Date: 2017-03-30 17:16:07
Message-ID: 20170330171607.n4qhlvlg4x63qh2i@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

> @@ -9633,6 +9643,12 @@ xlog_redo(XLogReaderState *record)
> SetTransactionIdLimit(checkPoint.oldestXid, checkPoint.oldestXidDB);
>
> /*
> + * There can be no concurrent writers to oldestCatalogXmin during
> + * recovery, so no need to take ProcArrayLock.
> + */
> + ShmemVariableCache->oldestCatalogXmin = checkPoint.oldestCatalogXmin;

s/writers/writes/?

> @@ -9731,6 +9748,15 @@ xlog_redo(XLogReaderState *record)
> checkPoint.oldestXid))
> SetTransactionIdLimit(checkPoint.oldestXid,
> checkPoint.oldestXidDB);
> +
> + /*
> + * There can be no concurrent writers to oldestCatalogXmin during
> + * recovery, so no need to take ProcArrayLock.
> + */
> + if (TransactionIdPrecedes(ShmemVariableCache->oldestCatalogXmin,
> + checkPoint.oldestCatalogXmin)
> + ShmemVariableCache->oldestCatalogXmin = checkPoint.oldestCatalogXmin;

dito.

> @@ -276,8 +279,21 @@ CreateInitDecodingContext(char *plugin,
>
> ReplicationSlotsComputeRequiredXmin(true);
>
> + /*
> + * If this is the first slot created on the master we won't have a
> + * persistent record of the oldest safe xid for historic snapshots yet.
> + * Force one to be recorded so that when we go to replay from this slot we
> + * know it's safe.
> + */
> + force_standby_snapshot =
> + !TransactionIdIsValid(ShmemVariableCache->oldestCatalogXmin);

s/first slot/first logical slot/. Also, the reference to master doesn't
seem right.

> LWLockRelease(ProcArrayLock);
>
> + /* Update ShmemVariableCache->oldestCatalogXmin */
> + if (force_standby_snapshot)
> + LogStandbySnapshot();

The comment and code don't quite square to me - it's far from obvious
that LogStandbySnapshot does something like that. I'd even say it's a
bad idea to have it do that.

> /*
> * tell the snapshot builder to only assemble snapshot once reaching the
> * running_xact's record with the respective xmin.
> @@ -376,6 +392,8 @@ CreateDecodingContext(XLogRecPtr start_lsn,
> start_lsn = slot->data.confirmed_flush;
> }
>
> + EnsureActiveLogicalSlotValid();

This seems like it should be in a separate patch, and seperately
reviewed. It's code that's currently unreachable (and thus untestable).

> +/*
> + * Test to see if the active logical slot is usable.
> + */
> +static void
> +EnsureActiveLogicalSlotValid(void)
> +{
> + TransactionId shmem_catalog_xmin;
> +
> + Assert(MyReplicationSlot != NULL);
> +
> + /*
> + * A logical slot can become unusable if we're doing logical decoding on a
> + * standby or using a slot created before we were promoted from standby
> + * to master.

Neither of those is currently possible...

> If the master advanced its global catalog_xmin past the
> + * threshold we need it could've removed catalog tuple versions that
> + * we'll require to start decoding at our restart_lsn.
> + *
> + * We need a barrier so that if we decode in recovery on a standby we
> + * don't allow new decoding sessions to start after redo has advanced
> + * the threshold.
> + */
> + if (RecoveryInProgress())
> + pg_memory_barrier();

I don't think this is a meaningful locking protocol. It's a bad idea to
use lock-free programming without need, especially when the concurrency
protocol isn't well defined. With what other barrier does this pair
with? What prevents the data being out of date by the time we actually
do the check?

> diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
> index cfc3fba..cdc5f95 100644
> --- a/src/backend/replication/walsender.c
> +++ b/src/backend/replication/walsender.c
> @@ -1658,6 +1658,11 @@ PhysicalConfirmReceivedLocation(XLogRecPtr lsn)
> * be energy wasted - the worst lost information can do here is give us
> * wrong information in a statistics view - we'll just potentially be more
> * conservative in removing files.
> + *
> + * We don't have to do any effective_xmin / effective_catalog_xmin testing
> + * here either, like for LogicalConfirmReceivedLocation. If we received
> + * the xmin and catalog_xmin from downstream replication slots we know they
> + * were already confirmed there,
> */
> }

This comment reads as if LogicalConfirmReceivedLocation had
justification for not touching / checking catalog_xmin. But it does.

> /*
> + * Update our knowledge of the oldest xid we can safely create historic
> + * snapshots for.
> + *
> + * There can be no concurrent writers to oldestCatalogXmin during
> + * recovery, so no need to take ProcArrayLock.

By now I think is pretty flawed logic, because there can be concurrent
readers, that need to be safe against oldestCatalogXmin advancing
concurrently.

> /*
> - * It's important *not* to include the limits set by slots here because
> + * It's important *not* to include the xmin set by slots here because
> * snapbuild.c uses oldestRunningXid to manage its xmin horizon. If those
> * were to be included here the initial value could never increase because
> * of a circular dependency where slots only increase their limits when
> * running xacts increases oldestRunningXid and running xacts only
> * increases if slots do.
> + *
> + * We can include the catalog_xmin limit here; there's no similar
> + * circularity, and we need it to log xl_running_xacts records for
> + * standbys.
> */

Those comments seem to need some more heavyhanded reconciliation.

> *
> * Return the current slot xmin limits. That's useful to be able to remove
> * data that's older than those limits.
> + *
> + * For logical replication slots' catalog_xmin, we return both the effective

This seems to need some light editing.

> /*
> * Record an enhanced snapshot of running transactions into WAL.
> *
> + * We also record the value of procArray->replication_slot_catalog_xmin
> + * obtained from GetRunningTransactionData here, so standbys know we're about
> + * to advance ShmemVariableCache->oldestCatalogXmin to its value and start
> + * removing dead catalog tuples below that threshold.

I think needs some rephrasing. We're not necessarily about to remove
catalog tuples here, nor are we necessarily advancing oldestCatalogXmin.

> +static void
> +UpdateOldestCatalogXmin(TransactionId pendingOldestCatalogXmin)
> +{
> + LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
> + if (TransactionIdPrecedes(ShmemVariableCache->oldestCatalogXmin, pendingOldestCatalogXmin)
> + || (TransactionIdIsValid(ShmemVariableCache->oldestCatalogXmin) != TransactionIdIsValid(pendingOldestCatalogXmin)))
> + ShmemVariableCache->oldestCatalogXmin = pendingOldestCatalogXmin;
> + LWLockRelease(ProcArrayLock);
> +}

Doing TransactionIdPrecedes before ensuring
ShmemVariableCache->oldestCatalogXmin is actually valid doesn't strike
me as a good idea. Generally, the expression as it stands is hard to
understand.

> diff --git a/src/include/access/transam.h b/src/include/access/transam.h
> index d25a2dd..a4ecfb7 100644
> --- a/src/include/access/transam.h
> +++ b/src/include/access/transam.h
> @@ -136,6 +136,17 @@ typedef struct VariableCacheData
> * aborted */
>
> /*
> + * This field is protected by ProcArrayLock except
> + * during recovery, when it's set unlocked.
> + *
> + * oldestCatalogXmin is the oldest xid it is
> + * guaranteed to be safe to create a historic
> + * snapshot for. See also
> + * procArray->replication_slot_catalog_xmin
> + */
> + TransactionId oldestCatalogXmin;

Maybe it'd be better to rephrase that do something like
"oldestCatalogXmin guarantees that no valid catalog tuples >= than it
are removed. That property is used for logical decoding.". or similar?

> /*
> * Each page of XLOG file has a header like this:
> */
> -#define XLOG_PAGE_MAGIC 0xD097 /* can be used as WAL version indicator */
> +#define XLOG_PAGE_MAGIC 0xD100 /* can be used as WAL version indicator */

We normally only advance this by one, it's not tied to the poistgres version.

I'm sorry, but to me this patch isn't ready. I'm also doubtful that it
makes a whole lot of sense on its own, without having finished the
design for decoding on a standby - it seems quite likely that we'll have
to redesign the mechanisms in here a bit for that. For 10 this seems to
do nothing but add overhead?

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2017-03-30 17:16:53 Re: Logical decoding on standby
Previous Message Tom Lane 2017-03-30 17:12:53 Re: Use \if ... \elif ... \else ... \endif to simplify alternative in psql scripting.