Re: Minimal logical decoding on standbys

From: "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>, fabriziomello(at)gmail(dot)com, tushar <tushar(dot)ahuja(at)enterprisedb(dot)com>, Rahila Syed <rahila(dot)syed(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Minimal logical decoding on standbys
Date: 2023-01-24 14:59:06
Message-ID: 655d1dde-4d45-69a4-af7d-c4752ab34c11@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 1/24/23 12:21 AM, Melanie Plageman wrote:
> I'm new to this thread and subject, but I had a few basic thoughts about
> the first patch in the set.
>

Thanks for looking at it!

> On Mon, Jan 23, 2023 at 12:03:35PM +0100, Drouvot, Bertrand wrote:
>>
>> Please find V42 attached.
>>
>> From 3c206bd77831d507f4f95e1942eb26855524571a Mon Sep 17 00:00:00 2001
>> From: bdrouvotAWS <bdrouvot(at)amazon(dot)com>
>> Date: Mon, 23 Jan 2023 10:07:51 +0000
>> Subject: [PATCH v42 1/6] Add info in WAL records in preparation for logical
>> slot conflict handling.
>> MIME-Version: 1.0
>> Content-Type: text/plain; charset=UTF-8
>> Content-Transfer-Encoding: 8bit
>>
>> Overall design:
>>
>> 1. We want to enable logical decoding on standbys, but replay of WAL
>> from the primary might remove data that is needed by logical decoding,
>> causing replication conflicts much as hot standby does.
>
> It is a little confusing to mention replication conflicts in point 1. It
> makes it sound like it already logs a recovery conflict. Without the
> recovery conflict handling in this patchset, logical decoding of
> statements using data that has been removed will fail with some error
> like :
> ERROR: could not map filenumber "xxx" to relation OID
> Part of what this patchset does is introduce the concept of a new kind
> of recovery conflict and a resolution process.
>

I think I understand what you mean, what about the following?

1. We want to enable logical decoding on standbys, but replay of WAL
from the primary might remove data that is needed by logical decoding,
causing error(s) on the standby.

To prevent those errors a new replication conflict scenario
needs to be addressed (as much as hot standby does.)

>> 2. Our chosen strategy for dealing with this type of replication slot
>> is to invalidate logical slots for which needed data has been removed.
>>
>> 3. To do this we need the latestRemovedXid for each change, just as we
>> do for physical replication conflicts, but we also need to know
>> whether any particular change was to data that logical replication
>> might access.

> It isn't clear from the above sentence why you would need both. I think
> it has something to do with what is below (hot_standby_feedback being
> off), but I'm not sure, so the order is confusing.
>

Right, it has to deal with the xid horizons too. So the idea is to check if
1) there is a risk of conflict and 2) if there is a risk then check
is there a conflict? (based on the xid). I'll reword this part.

>> 4. We can't rely on the standby's relcache entries for this purpose in
>> any way, because the startup process can't access catalog contents.
>>
>> 5. Therefore every WAL record that potentially removes data from the
>> index or heap must carry a flag indicating whether or not it is one
>> that might be accessed during logical decoding.
>>
>> Why do we need this for logical decoding on standby?
>>
>> First, let's forget about logical decoding on standby and recall that
>> on a primary database, any catalog rows that may be needed by a logical
>> decoding replication slot are not removed.
>>
>> This is done thanks to the catalog_xmin associated with the logical
>> replication slot.
>>
>> But, with logical decoding on standby, in the following cases:
>>
>> - hot_standby_feedback is off
>> - hot_standby_feedback is on but there is no a physical slot between
>> the primary and the standby. Then, hot_standby_feedback will work,
>> but only while the connection is alive (for example a node restart
>> would break it)
>>
>> Then, the primary may delete system catalog rows that could be needed
>> by the logical decoding on the standby (as it does not know about the
>> catalog_xmin on the standby).
>>
>> So, it’s mandatory to identify those rows and invalidate the slots
>> that may need them if any. Identifying those rows is the purpose of
>> this commit.
>
> I would like a few more specifics about this commit (patch 1 in the set)
> itself in the commit message.
>
> I think it would be good to have the commit message mention what kinds
> of operations require WAL to contain information about whether or not it
> is operating on a catalog table and why this is.
>
> For example, I think the explanation of the feature makes it clear why
> vacuum and pruning operations would require isCatalogRel, however it
> isn't immediately obvious why page reuse would.
>

What do you think about putting those extra explanations in the code instead?

> Also, because the diff has so many function signatures changed, it is
> hard to tell which xlog record types are actually being changed to
> include isCatalogRel. It might be too detailed/repetitive for the final
> commit message to have a list of the xlog types requiring this info
> (gistxlogPageReuse, spgxlogVacuumRedirect, xl_hash_vacuum_one_page,
> xl_btree_reuse_page, xl_btree_delete, xl_heap_visible, xl_heap_prune,
> xl_heap_freeze_page) but perhaps you could enumerate all the general
> operations (freeze page, vacuum, prune, etc).
>

Right, at the end there is only a few making "real" use of it: they can be
enumerated in the commit message. Will do.

>> Implementation:
>>
>> When a WAL replay on standby indicates that a catalog table tuple is
>> to be deleted by an xid that is greater than a logical slot's
>> catalog_xmin, then that means the slot's catalog_xmin conflicts with
>> the xid, and we need to handle the conflict. While subsequent commits
>> will do the actual conflict handling, this commit adds a new field
>> isCatalogRel in such WAL records (and a new bit set in the
>> xl_heap_visible flags field), that is true for catalog tables, so as to
>> arrange for conflict handling.
>
> You do mention it a bit here, but I think it could be more clear and
> specific.

Ok, will try to be more clear.

>
>> diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c
>> index ba394f08f6..235f1a1843 100644
>> --- a/src/backend/access/gist/gist.c
>> +++ b/src/backend/access/gist/gist.c
>> @@ -348,7 +348,7 @@ gistplacetopage(Relation rel, Size freespace, GISTSTATE *giststate,
>> for (; ptr; ptr = ptr->next)
>> {
>> /* Allocate new page */
>> - ptr->buffer = gistNewBuffer(rel);
>> + ptr->buffer = gistNewBuffer(heapRel, rel);
>> GISTInitBuffer(ptr->buffer, (is_leaf) ? F_LEAF : 0);
>> ptr->page = BufferGetPage(ptr->buffer);
>> ptr->block.blkno = BufferGetBlockNumber(ptr->buffer);
>> diff --git a/src/backend/access/gist/gistbuild.c b/src/backend/access/gist/gistbuild.c
>> index d21a308d41..a87890b965 100644
>> --- a/src/backend/access/gist/gistbuild.c
>> +++ b/src/backend/access/gist/gistbuild.c
>> @@ -298,7 +298,7 @@ gistbuild(Relation heap, Relation index, IndexInfo *indexInfo)
>> Page page;
>>
>> /* initialize the root page */
>> - buffer = gistNewBuffer(index);
>> + buffer = gistNewBuffer(heap, index);
>> Assert(BufferGetBlockNumber(buffer) == GIST_ROOT_BLKNO);
>> page = BufferGetPage(buffer);
>>
>> diff --git a/src/backend/access/gist/gistutil.c b/src/backend/access/gist/gistutil.c
>> index 56451fede1..119e34ce0f 100644
>> --- a/src/backend/access/gist/gistutil.c
>> +++ b/src/backend/access/gist/gistutil.c
>> @@ -821,7 +821,7 @@ gistcheckpage(Relation rel, Buffer buf)
>> * Caller is responsible for initializing the page by calling GISTInitBuffer
>> */
>> Buffer
>> -gistNewBuffer(Relation r)
>> +gistNewBuffer(Relation heaprel, Relation r)
>> {
>
> It is not very important, but I noticed you made "heaprel" the last
> parameter to all of the btree-related functions but the first parameter
> to the gist functions. I thought it might be nice to make the order
> consistent.

Agree, will do.

> I also was wondering why you made it the last argument to
> all the btree functions to begin with (i.e. instead of directly after
> the first rel argument).
>

No real reasons, will put all of them after the first rel argument (that seems a better place).

>> diff --git a/src/include/access/gist_private.h b/src/include/access/gist_private.h
>> index 8af33d7b40..9bdac12baf 100644
>> --- a/src/include/access/gist_private.h
>> +++ b/src/include/access/gist_private.h
>> @@ -440,7 +440,7 @@ extern XLogRecPtr gistXLogPageDelete(Buffer buffer,
>> FullTransactionId xid, Buffer parentBuffer,
>> OffsetNumber downlinkOffset);
>>
>> -extern void gistXLogPageReuse(Relation rel, BlockNumber blkno,
>> +extern void gistXLogPageReuse(Relation heaprel, Relation rel, BlockNumber blkno,
>> FullTransactionId deleteXid);
>>
>> extern XLogRecPtr gistXLogUpdate(Buffer buffer,
>> @@ -485,7 +485,7 @@ extern bool gistproperty(Oid index_oid, int attno,
>> extern bool gistfitpage(IndexTuple *itvec, int len);
>> extern bool gistnospace(Page page, IndexTuple *itvec, int len, OffsetNumber todelete, Size freespace);
>> extern void gistcheckpage(Relation rel, Buffer buf);
>> -extern Buffer gistNewBuffer(Relation r);
>> +extern Buffer gistNewBuffer(Relation heaprel, Relation r);
>> extern bool gistPageRecyclable(Page page);
>> extern void gistfillbuffer(Page page, IndexTuple *itup, int len,
>> OffsetNumber off);
>> diff --git a/src/include/access/gistxlog.h b/src/include/access/gistxlog.h
>> index 09f9b0f8c6..191f0e5808 100644
>> --- a/src/include/access/gistxlog.h
>> +++ b/src/include/access/gistxlog.h
>> @@ -51,13 +51,13 @@ typedef struct gistxlogDelete
>> {
>> TransactionId snapshotConflictHorizon;
>> uint16 ntodelete; /* number of deleted offsets */
>> + bool isCatalogRel;
>
> In some of these struct definitions, I think it would help comprehension
> to have a comment explaining the purpose of this member.
>

Yeah, agree but it could be done in another patch (outside of this feature), agree?

> Thanks for all your hard work on this feature!

Thanks for the review and the feedback!

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Vladimir Sitnikov 2023-01-24 15:03:51 Re: run pgindent on a regular basis / scripted manner
Previous Message Takamichi Osumi (Fujitsu) 2023-01-24 14:57:22 RE: Time delayed LR (WAS Re: logical replication restrictions)