Re: Minimal logical decoding on standbys

From: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, tushar <tushar(dot)ahuja(at)enterprisedb(dot)com>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Minimal logical decoding on standbys
Date: 2019-07-10 11:42:35
Message-ID: CAJ3gD9d4oN6ik+0ORrgVTOoO=Z3jsEGsfLLZTwtvUK1ZpF7YWQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 10 Jul 2019 at 08:44, Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> Hi,
>
> Thanks for the new version! Looks like we're making progress towards
> something committable here.
>
> I think it'd be good to split the patch into a few pieces. I'd maybe do
> that like:
> 1) WAL format changes (plus required other changes)
> 2) Recovery conflicts with slots
> 3) logical decoding on standby
> 4) tests

All right. Will do that in the next patch set. For now, I have quickly
done the below changes in a single patch again (attached), in order to
get early comments if any.

>
>
> > @@ -589,6 +590,7 @@ gistXLogPageReuse(Relation rel, BlockNumber blkno, TransactionId latestRemovedXi
> > */
> >
> > /* XLOG stuff */
> > + xlrec_reuse.onCatalogTable = RelationIsAccessibleInLogicalDecoding(rel);
> > xlrec_reuse.node = rel->rd_node;
> > xlrec_reuse.block = blkno;
> > xlrec_reuse.latestRemovedXid = latestRemovedXid;
>
> Hm. I think we otherwise only ever use
> RelationIsAccessibleInLogicalDecoding() on tables, not on indexes. And
> while I think this would mostly work for builtin catalog tables, it
> won't work for "user catalog tables" as RelationIsUsedAsCatalogTable()
> won't perform any useful checks for indexes.
>
> So I think we either need to look up the table, or pass it down.

Done. Passed down the heap rel.

>
>
> > diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
> > index d768b9b..10b7857 100644
> > --- a/src/backend/access/heap/heapam.c
> > +++ b/src/backend/access/heap/heapam.c
> > @@ -7149,12 +7149,13 @@ heap_compute_xid_horizon_for_tuples(Relation rel,
> > * see comments for vacuum_log_cleanup_info().
> > */
> > XLogRecPtr
> > -log_heap_cleanup_info(RelFileNode rnode, TransactionId latestRemovedXid)
> > +log_heap_cleanup_info(Relation rel, TransactionId latestRemovedXid)
> > {
> > xl_heap_cleanup_info xlrec;
> > XLogRecPtr recptr;
> >
> > - xlrec.node = rnode;
> > + xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(rel);
> > + xlrec.node = rel->rd_node;
> > xlrec.latestRemovedXid = latestRemovedXid;
> >
> > XLogBeginInsert();
> > @@ -7190,6 +7191,7 @@ log_heap_clean(Relation reln, Buffer buffer,
> > /* Caller should not call me on a non-WAL-logged relation */
> > Assert(RelationNeedsWAL(reln));
> >
> > + xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(reln);
>
> It'd probably be a good idea to add a comment to
> RelationIsUsedAsCatalogTable() that it better never invoke anything
> performing catalog accesses. Otherwise there's quite the danger with
> recursion (some operation doing RelationIsAccessibleInLogicalDecoding(),
> that then accessing the catalog, which in turn could again need to
> perform said operation, loop).

Added comments in RelationIsUsedAsCatalogTable() as well as
RelationIsAccessibleInLogicalDecoding() :

* RelationIsAccessibleInLogicalDecoding
* True if we need to log enough information to have access via
* decoding snapshot.
* This definition should not invoke anything that performs catalog
* access. Otherwise, e.g. logging a WAL entry for catalog relation may
* invoke this function, which will in turn do catalog access, which may
* in turn cause another similar WAL entry to be logged, leading to
* infinite recursion.

> > /* Entry in pending-list of TIDs we need to revisit */
> > @@ -502,6 +503,7 @@ vacuumRedirectAndPlaceholder(Relation index, Buffer buffer)
> > OffsetNumber itemnos[MaxIndexTuplesPerPage];
> > spgxlogVacuumRedirect xlrec;
> >
> > + xlrec.onCatalogTable = get_rel_logical_catalog(index->rd_index->indrelid);
> > xlrec.nToPlaceholder = 0;
> > xlrec.newestRedirectXid = InvalidTransactionId;
>
> We should document that it is safe to do catalog acceses here, because
> spgist is never used to back catalogs. Otherwise there would be an a
> endless recursion danger here.

Comments added.

>
> Did you check how hard it we to just pass down the heap relation?

It does look hard. Check my comments in an earlier reply, that I have
pasted below :

> This one seems harder, but I'm not actually sure why we make it so
> hard. It seems like we just ought to add the table to IndexVacuumInfo.

This means we have to add heapRel assignment wherever we initialize
IndexVacuumInfo structure, namely in lazy_vacuum_index(),
lazy_cleanup_index(), validate_index(), analyze_rel(), and make sure
these functions have a heap rel handle. Do you think we should do this
as part of this patch ?

>
>
> > /*
> > + * Get the wal_level from the control file.
> > + */
> > +WalLevel
> > +GetActiveWalLevel(void)
> > +{
> > + return ControlFile->wal_level;
> > +}
>
> What does "Active" mean here? I assume it's supposed to indicate that it
> could be different than what's configured in postgresql.conf, for a
> replica? If so, that should be mentioned.

Done. Here are the new comments :
* Get the wal_level from the control file. For a standby, this value should be
* considered as its active wal_level, because it may be different from what
* was originally configured on standby.

>
>
> > +/*
> > * Initialization of shared memory for XLOG
> > */
> > Size
> > @@ -9843,6 +9852,19 @@ xlog_redo(XLogReaderState *record)
> > /* Update our copy of the parameters in pg_control */
> > memcpy(&xlrec, XLogRecGetData(record), sizeof(xl_parameter_change));
> >
> > + /*
> > + * Drop logical slots if we are in hot standby and master does not have
> > + * logical data.
>
> nitpick: s/master/the primary/ (mostly adding the "the", but I
> personally also prefer primary over master)
>
> s/logical data/a WAL level sufficient for logical decoding/
>
>
> > Don't bother to search for the slots if standby is
> > + * running with wal_level lower than logical, because in that case,
> > + * we would have either disallowed creation of logical slots or dropped
> > + * existing ones.
>
> s/Don't bother/No need/
> s/slots/potentially conflicting logically slots/

Done.

>
> > + if (InRecovery && InHotStandby &&
> > + xlrec.wal_level < WAL_LEVEL_LOGICAL &&
> > + wal_level >= WAL_LEVEL_LOGICAL)
> > + ResolveRecoveryConflictWithLogicalSlots(InvalidOid, InvalidTransactionId,
> > + gettext_noop("Logical decoding on standby requires wal_level >= logical on master."));
>
>
>
> > diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
> > index 151c3ef..c1bd028 100644
> > --- a/src/backend/replication/logical/decode.c
> > +++ b/src/backend/replication/logical/decode.c
> > @@ -190,11 +190,23 @@ DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
> > * can restart from there.
> > */
> > break;
> > + case XLOG_PARAMETER_CHANGE:
> > + {
> > + xl_parameter_change *xlrec =
> > + (xl_parameter_change *) XLogRecGetData(buf->record);
> > + /* Cannot proceed if master itself does not have logical data */
>
> This needs an explanation as to how this is reachable...

Done. Here are the comments :
* If wal_level on primary is reduced to less than logical, then we
* want to prevent existing logical slots from being used.
* Existing logical slot on standby gets dropped when this WAL
* record is replayed; and further, slot creation fails when the
* wal level is not sufficient; but all these operations are not
* synchronized, so a logical slot may creep in while the wal_level
* is being reduced. Hence this extra check.

>
>
> > + if (xlrec->wal_level < WAL_LEVEL_LOGICAL)
> > + ereport(ERROR,
> > + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> > + errmsg("logical decoding on standby requires "
> > + "wal_level >= logical on master")));
> > + break;
>
> Hm, this strikes me as a not quite good enough error message (same in
> other copies of the message). Perhaps something roughly like "could not
> continue with logical decoding, the primary's wal level is now too low
> (%u)"?

Haven't changed this. There is another reply from Robert. I think what
you want to emphasize is that we can't *continue*. I am not sure why
user can't infer that the "logical decoding could not continue" when
we say "logical decoding requires wal_level >= ...."

>
>
> > if (RecoveryInProgress())
> > - ereport(ERROR,
> > - (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> > - errmsg("logical decoding cannot be used while in recovery")));
> > + {
> > + /*
> > + * This check may have race conditions, but whenever
> > + * XLOG_PARAMETER_CHANGE indicates that wal_level has changed, we
> > + * verify that there are no existing logical replication slots. And to
> > + * avoid races around creating a new slot,
> > + * CheckLogicalDecodingRequirements() is called once before creating
> > + * the slot, and once when logical decoding is initially starting up.
> > + */
> > + if (GetActiveWalLevel() < WAL_LEVEL_LOGICAL)
> > + ereport(ERROR,
> > + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> > + errmsg("logical decoding on standby requires "
> > + "wal_level >= logical on master")));
> > + }
> > }
> >
> > /*
> > @@ -241,6 +240,8 @@ CreateInitDecodingContext(char *plugin,
> > LogicalDecodingContext *ctx;
> > MemoryContext old_context;
> >
> > + CheckLogicalDecodingRequirements();
> > +
>
> This should reference the above explanation.

Done.

>
>
>
>
> > /*
> > + * Permanently drop a conflicting replication slot. If it's already active by
> > + * another backend, send it a recovery conflict signal, and then try again.
> > + */
> > +static void
> > +ReplicationSlotDropConflicting(ReplicationSlot *slot)
>
>
> > +void
> > +ResolveRecoveryConflictWithLogicalSlots(Oid dboid, TransactionId xid,
> > + char *conflict_reason)
> > +{
> > + /*
> > + * Build the conflict_str which will look like :
> > + * "Slot conflicted with xid horizon which was being increased
> > + * to 9012 (slot xmin: 1234, slot catalog_xmin: 5678)."
> > + */
> > + initStringInfo(&conflict_xmins);
> > + if (TransactionIdIsValid(slot_xmin) &&
> > + TransactionIdPrecedesOrEquals(slot_xmin, xid))
> > + {
> > + appendStringInfo(&conflict_xmins, "slot xmin: %d", slot_xmin);
> > + }
> > + if (TransactionIdIsValid(slot_catalog_xmin) &&
> > + TransactionIdPrecedesOrEquals(slot_catalog_xmin, xid))
> > + appendStringInfo(&conflict_xmins, "%sslot catalog_xmin: %d",
> > + conflict_xmins.len > 0 ? ", " : "",
> > + slot_catalog_xmin);
> > +
> > + if (conflict_xmins.len > 0)
> > + {
> > + initStringInfo(&conflict_str);
> > + appendStringInfo(&conflict_str, "%s %d (%s).",
> > + conflict_sentence, xid, conflict_xmins.data);
> > + found_conflict = true;
> > + conflict_reason = conflict_str.data;
> > + }
> > + }
>
>
> I think this is going to be a nightmare for translators, no?

For translators, I think the .po files will have the required text,
because I have used gettext_noop() for both conflict_sentence and the
passed in conflict_reason parameter. And the "dropped conflicting
slot." is passed to ereport() as usual. The rest portion of errdetail
is not language specific. E.g. "slot" remains "slot".

> I'm not clear as to why any of this is needed?

The conflict can happen for either xmin or catalog_xmin or both, right
? The purpose of the above is to show only conflicting xmin out of the
two.

>
>
>
> > + /* ReplicationSlotDropPtr() would acquire the lock below */
> > + LWLockRelease(ReplicationSlotControlLock);
>
> "would acquire"? I think it *does* acquire, right?

Yes, Changed to "will".

>
>
>
> > @@ -2879,6 +2882,25 @@ RecoveryConflictInterrupt(ProcSignalReason reason)
> > case PROCSIG_RECOVERY_CONFLICT_LOCK:
> > case PROCSIG_RECOVERY_CONFLICT_TABLESPACE:
> > case PROCSIG_RECOVERY_CONFLICT_SNAPSHOT:
> > + case PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT:
> > + /*
> > + * For conflicts that require a logical slot to be dropped, the
> > + * requirement is for the signal receiver to release the slot,
> > + * so that it could be dropped by the signal sender. So for
> > + * normal backends, the transaction should be aborted, just
> > + * like for other recovery conflicts. But if it's walsender on
> > + * standby, then it has to be killed so as to release an
> > + * acquired logical slot.
> > + */
> > + if (am_cascading_walsender &&
> > + reason == PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT &&
> > + MyReplicationSlot && SlotIsLogical(MyReplicationSlot))
> > + {
> > + RecoveryConflictPending = true;
> > + QueryCancelPending = true;
> > + InterruptPending = true;
> > + break;
> > + }
>
> Huh, I'm not following as to why that's needed for walsenders?

For normal backends, we ignore this signal if we aren't in a
transaction (block). But for walsender, there is no transaction, but
we cannot ignore the signal. This is because walsender can keep a
logical slot acquired when it was spawned by "pg_recvlogical --start".
So we can't ignore the signal. So the only way that we can make it
release the acquired slot is to kill it.

>
>
> > @@ -1499,6 +1499,7 @@ pg_stat_get_db_conflict_all(PG_FUNCTION_ARGS)
> > dbentry->n_conflict_tablespace +
> > dbentry->n_conflict_lock +
> > dbentry->n_conflict_snapshot +
> > + dbentry->n_conflict_logicalslot +
> > dbentry->n_conflict_bufferpin +
> > dbentry->n_conflict_startup_deadlock);
>
> I think this probably needs adjustments in a few more places,
> e.g. monitoring.sgml...

Oops, yeah, to search for similar additions, I had looked for
"conflict_snapshot" using cscope. I should have done the same using
"git grep".
Done now.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Attachment Content-Type Size
logical-decoding-on-standby_v12.patch application/octet-stream 69.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joe Conway 2019-07-10 11:43:11 Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
Previous Message Joe Conway 2019-07-10 11:34:32 Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)