Re: Minimal logical decoding on standbys

From: "Drouvot, Bertrand" <bdrouvot(at)amazon(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: <fabriziomello(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, tushar <tushar(dot)ahuja(at)enterprisedb(dot)com>, "[pgdg] Robert Haas" <robertmhaas(at)gmail(dot)com>, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>, Rahila Syed <rahila(dot)syed(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Minimal logical decoding on standbys
Date: 2021-04-08 10:19:00
Message-ID: 41c3d8dd-8bc7-13d9-e14c-2c29ba5f2083@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Thanks for the feedback!

On 4/6/21 8:02 PM, Andres Freund wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>
>
>
> Hi,
>
> On 2021-04-06 14:30:29 +0200, Drouvot, Bertrand wrote:
>> From 827295f74aff9c627ee722f541a6c7cc6d4133cf Mon Sep 17 00:00:00 2001
>> From: bdrouvotAWS <bdrouvot(at)amazon(dot)com>
>> Date: Tue, 6 Apr 2021 11:59:23 +0000
>> Subject: [PATCH v15 1/5] Allow logical decoding on standby.
>>
>> Allow a logical slot to be created on standby. Restrict its usage
>> or its creation if wal_level on primary is less than logical.
>> During slot creation, it's restart_lsn is set to the last replayed
>> LSN. Effectively, a logical slot creation on standby waits for an
>> xl_running_xact record to arrive from primary. Conflicting slots
>> would be handled in next commits.
>>
>> Andres Freund and Amit Khandekar.
> I think more people have worked on this by now...
>
> Does this strike you as an accurate description?
>
> Author: Andres Freund (in an older version), Amit Khandekar, Bertrand Drouvot
> Reviewed-By: Bertrand Drouvot, Andres Freund, Robert Haas

Yes it looks like, adding Fabrizio as reviewer as well.

>> --- a/src/backend/replication/logical/logical.c
>> +++ b/src/backend/replication/logical/logical.c
>> @@ -119,23 +119,22 @@ CheckLogicalDecodingRequirements(void)
>> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>> errmsg("logical decoding requires a database connection")));
>>
>> - /* ----
>> - * TODO: We got to change that someday soon...
>> - *
>> - * There's basically three things missing to allow this:
>> - * 1) We need to be able to correctly and quickly identify the timeline a
>> - * LSN belongs to
>> - * 2) We need to force hot_standby_feedback to be enabled at all times so
>> - * the primary cannot remove rows we need.
>> - * 3) support dropping replication slots referring to a database, in
>> - * dbase_redo. There can't be any active ones due to HS recovery
>> - * conflicts, so that should be relatively easy.
>> - * ----
>> - */
>> if (RecoveryInProgress())
>> - ereport(ERROR,
>> - (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
>> - errmsg("logical decoding cannot be used while in recovery")));
> Maybe I am just missing something right now, and maybe I'm being a bit
> overly pedantic, but I don't immediately see how 0001 is correct without
> 0002 and 0003? I think it'd be better to first introduce the conflict
> information, then check for conflicts, and only after that allow
> decoding on standbys?

RIght, changing the order in v17 attached.

>
>> diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
>> index 6f8810e149..6a21cba362 100644
>> --- a/src/backend/access/transam/xlog.c
>> +++ b/src/backend/access/transam/xlog.c
>> @@ -5080,6 +5080,17 @@ LocalProcessControlFile(bool reset)
>> ReadControlFile();
>> }
>>
>> +/*
>> + * Get the wal_level from the control file. For a standby, this value should be
>> + * considered as its active wal_level, because it may be different from what
>> + * was originally configured on standby.
>> + */
>> +WalLevel
>> +GetActiveWalLevel(void)
>> +{
>> + return ControlFile->wal_level;
>> +}
>> +
> This strikes me as error-prone - there's nothing in the function name
> that this should mainly (only?) be used during recovery...
>
renamed to GetActiveWalLevelOnStandby().

>> + if (SlotIsPhysical(slot))
>> + restart_lsn = GetRedoRecPtr();
>> + else if (RecoveryInProgress())
>> + {
>> + restart_lsn = GetXLogReplayRecPtr(NULL);
>> + /*
>> + * Replay pointer may point one past the end of the record. If that
>> + * is a XLOG page boundary, it will not be a valid LSN for the
>> + * start of a record, so bump it up past the page header.
>> + */
>> + if (!XRecOffIsValid(restart_lsn))
>> + {
>> + if (restart_lsn % XLOG_BLCKSZ != 0)
>> + elog(ERROR, "invalid replay pointer");
>> +
>> + /* For the first page of a segment file, it's a long header */
>> + if (XLogSegmentOffset(restart_lsn, wal_segment_size) == 0)
>> + restart_lsn += SizeOfXLogLongPHD;
>> + else
>> + restart_lsn += SizeOfXLogShortPHD;
>> + }
>> + }
> This seems like a layering violation to me. I don't think stuff like
> this should be outside of xlog[reader].c, and definitely not in
> ReplicationSlotReserveWal().

Moved the bump to GetXLogReplayRecPtr(), does that make more sense or
did you have something else in mind?

> Relevant discussion (which totally escaped my mind):
> https://postgr.es/m/CAJ3gD9csOr0LoYoMK9NnfBk0RZmvHXcJAFWFd2EuL%3DNOfz7PVA%40mail.gmail.com
>
>
>> + else
>> + restart_lsn = GetXLogInsertRecPtr();
>> +
>> + SpinLockAcquire(&slot->mutex);
>> + slot->data.restart_lsn = restart_lsn;
>> + SpinLockRelease(&slot->mutex);
>> +
>> if (!RecoveryInProgress() && SlotIsLogical(slot))
>> {
>> XLogRecPtr flushptr;
>>
>> - /* start at current insert position */
>> - restart_lsn = GetXLogInsertRecPtr();
>> - SpinLockAcquire(&slot->mutex);
>> - slot->data.restart_lsn = restart_lsn;
>> - SpinLockRelease(&slot->mutex);
>> -
>> /* make sure we have enough information to start */
>> flushptr = LogStandbySnapshot();
>>
>> /* and make sure it's fsynced to disk */
>> XLogFlush(flushptr);
>> }
>> - else
>> - {
>> - restart_lsn = GetRedoRecPtr();
>> - SpinLockAcquire(&slot->mutex);
>> - slot->data.restart_lsn = restart_lsn;
>> - SpinLockRelease(&slot->mutex);
>> - }
>>
>> /* prevent WAL removal as fast as possible */
>> ReplicationSlotsComputeRequiredLSN();
> I think I'd move the LogStandbySnapshot() piece out of the entire
> loop. There's no reason for logging multiple ones if we then just end up
> failing because of the XLogGetLastRemovedSegno() check.

Right, moved it outside of the loop.

>
>> diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
>> index 178d49710a..6c4c26c2fe 100644
>> --- a/src/include/access/heapam_xlog.h
>> +++ b/src/include/access/heapam_xlog.h
>> @@ -239,6 +239,7 @@ typedef struct xl_heap_update
>> */
>> typedef struct xl_heap_clean
>> {
>> + bool onCatalogTable;
>> TransactionId latestRemovedXid;
>> uint16 nredirected;
>> uint16 ndead;
>> @@ -254,6 +255,7 @@ typedef struct xl_heap_clean
>> */
>> typedef struct xl_heap_cleanup_info
>> {
>> + bool onCatalogTable;
>> RelFileNode node;
>> TransactionId latestRemovedXid;
>> } xl_heap_cleanup_info;
>> @@ -334,6 +336,7 @@ typedef struct xl_heap_freeze_tuple
>> */
>> typedef struct xl_heap_freeze_page
>> {
>> + bool onCatalogTable;
>> TransactionId cutoff_xid;
>> uint16 ntuples;
>> } xl_heap_freeze_page;
>> @@ -348,6 +351,7 @@ typedef struct xl_heap_freeze_page
>> */
>> typedef struct xl_heap_visible
>> {
>> + bool onCatalogTable;
>> TransactionId cutoff_xid;
>> uint8 flags;
>> } xl_heap_visible;
> Reminder to self: This needs a WAL version bump.
>
>> diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
>> index 9a3a03e520..3405070d63 100644
>> --- a/src/include/utils/rel.h
>> +++ b/src/include/utils/rel.h
>> @@ -16,6 +16,7 @@
>>
>> #include "access/tupdesc.h"
>> #include "access/xlog.h"
>> +#include "catalog/catalog.h"
>> #include "catalog/pg_class.h"
>> #include "catalog/pg_index.h"
>> #include "catalog/pg_publication.h"
> Not clear why this is in this patch?

It's needed for IsCatalogRelation() call in
RelationIsAccessibleInLogicalDecoding() and RelationIsLogicallyLogged().

So instead, in v17 attached i removed the new includes of catalog.h as
it makes more sense to me to keep this new one in rel.h.

>> diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
>> index 5ba776e789..03c5dbea48 100644
>> --- a/src/backend/postmaster/pgstat.c
>> +++ b/src/backend/postmaster/pgstat.c
>> @@ -2928,6 +2928,24 @@ pgstat_send_archiver(const char *xlog, bool failed)
>> pgstat_send(&msg, sizeof(msg));
>> }
>>
>> +/* ----------
>> + * pgstat_send_droplogicalslot() -
>> + *
>> + * Tell the collector about a logical slot being dropped
>> + * due to conflict.
>> + * ----------
>> + */
>> +void
>> +pgstat_send_droplogicalslot(Oid dbOid)
>> +{
>> + PgStat_MsgRecoveryConflict msg;
>> +
>> + pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_RECOVERYCONFLICT);
>> + msg.m_databaseid = dbOid;
>> + msg.m_reason = PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT;
>> + pgstat_send(&msg, sizeof(msg));
>> +}
> Why do we have this in adition to pgstat_report_replslot_drop()? ISTM
> that we should instead add a reason parameter to
> pgstat_report_replslot_drop()?

Added a reason parameter in pgstat_report_replslot_drop() and dropped
pgstat_send_droplogicalslot().

>
>> +/*
>> + * Resolve recovery conflicts with logical slots.
>> + *
>> + * When xid is valid, it means that rows older than xid might have been
>> + * removed.
> I don't think the past tense is correct - the rows better not be removed
> yet on the standby, otherwise we'd potentially do something random in
> decoding.
>
RIght, wording changed.
>
>> @@ -297,6 +297,24 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU
>> may consume changes from a slot at any given time.
>> </para>
>>
>> + <para>
>> + A logical replication slot can also be created on a hot standby. To prevent
>> + <command>VACUUM</command> from removing required rows from the system
>> + catalogs, <varname>hot_standby_feedback</varname> should be set on the
>> + standby. In spite of that, if any required rows get removed, the slot gets
>> + dropped. Existing logical slots on standby also get dropped if wal_level
>> + on primary is reduced to less than 'logical'.
>> + </para>
> I think this should add that it's very advisable to use a physical slot
> between primary and standby. Otherwise hot_standby_feedback will work,
> but only while the connection is alive - as soon as it breaks, a node
> gets restarted, ...

Good point, wording added .

v17 attached does contain those changes.

Remarks related to the TAP tests have not been addressed in v17, will
look at it now.

Bertrand

Attachment Content-Type Size
v17-0005-Doc-changes-describing-details-about-logical-dec.patch text/plain 2.1 KB
v17-0004-New-TAP-test-for-logical-decoding-on-standby.patch text/plain 20.9 KB
v17-0003-Allow-logical-decoding-on-standby.patch text/plain 16.6 KB
v17-0002-Handle-logical-slot-conflicts-on-standby.patch text/plain 28.7 KB
v17-0001-Add-info-in-WAL-records-in-preparation-for-logic.patch text/plain 18.0 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2021-04-08 10:22:06 Re: Order dependency in function test
Previous Message Tomas Vondra 2021-04-08 10:18:36 Re: maximum columns for brin bloom indexes