Re: Minimal logical decoding on standbys

From: "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>, fabriziomello(at)gmail(dot)com, tushar <tushar(dot)ahuja(at)enterprisedb(dot)com>, Rahila Syed <rahila(dot)syed(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Minimal logical decoding on standbys
Date: 2022-12-13 10:48:17
Message-ID: 9f0a021e-40e1-0ae6-8086-a4457c99749e@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 12/12/22 6:41 PM, Robert Haas wrote:
> On Sat, Dec 10, 2022 at 3:09 AM Drouvot, Bertrand
> <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
>> Attaching V30, mandatory rebase due to 66dcb09246.
>
> It's a shame that this hasn't gotten more attention, because the topic
> is important, but I'm as guilty of being too busy to spend a lot of
> time on it as everyone else.

Thanks for looking at it! Yeah, I think this is an important feature too.

>
> Anyway, while I'm not an expert on this topic,

Then, we are two ;-)
I "just" resurrected this very old thread and do the best that I can to have it moving forward.

> I did spend a little
> time looking at it today, especially 0001. Here are a few comments:
>
> I think that it's not good for IndexIsAccessibleInLogicalDecoding and
> RelationIsAccessibleInLogicalDecoding to both exist. Indexes and
> tables are types of relations, so this invites confusion: when the
> object in question is an index, it would seem that either one can be
> applied, based on the names.

Agree.

> I think the real problem here is that
> RelationIsAccessibleInLogicalDecoding is returning *the wrong answer*
> when the relation is a user-catalog table. It does so because it
> relies on RelationIsUsedAsCatalogTable, and that macro relies on
> checking whether the reloptions include user_catalog_table.
>

I think the Macro is returning the right answer when the relation is a user-catalog table.
I think the purpose is to identify relations that are permitted in READ only access during logical decoding.
Those are the ones that have been created by initdb in the pg_catalog schema, or have been marked as user provided catalog tables (that's what is documented in [1]).

Or did you mean when the relation is "NOT" a user-catalog table?

> But here we can see where past thinking of this topic has been,
> perhaps, a bit fuzzy. If that option were called user_catalog_relation
> and had to be set on both tables and indexes as appropriate, then
> RelationIsAccessibleInLogicalDecoding would already be doing the right
> thing, and consequently there would be no need to add
> IndexIsAccessibleInLogicalDecoding.

Yeah, agree.

> I think we should explore the idea
> of making the existing macro return the correct answer rather than
> adding a new one. It's probably too late to redefine the semantics of
> user_catalog_table, although if anyone wants to argue that we could
> require logical decoding plugins to set this for both indexes and
> tables, and/or rename to say relation instead of table, and/or add a
> parallel reloption called user_catalog_index, then let's talk about
> that.
>
> Otherwise, I think we can consider adjusting the definition of
> RelationIsUsedAsCatalogTable. The simplest way to do that would be to
> make it check indisusercatalog for indexes and do what it does already
> for tables. Then IndexIsUserCatalog and
> IndexIsAccessibleInLogicalDecoding go away and
> RelationIsAccessibleInLogicalDecoding returns the right answer in all
> cases.
>

That does sound a valid option to me too, I'll look at it.

> But I also wonder if a new pg_index column is really the right
> approach here. One fairly obvious alternative is to try to use the
> user_catalog_table reloption in both places. We could try to propagate
> that reloption from the table to its indexes; whenever it's set or
> unset on the table, push that down to each index. We'd have to take
> care not to let the property be changed independently on indexes,
> though. This feels a little grotty to me, but it does have the
> advantage of symmetry.

I thought about this approach too when working on it. But I thought it would be "weird" to start to propagate option(s) from table(s) to indexe(s). I mean, if that's an "option" why should it be propagated?
Furthermore, it seems to me that this option does behave more like a property (that affects logical decoding), more like logged/unlogged (being reflected in pg_class.relpersistence not in reloptions).

> Another way to get symmetry is to go the other
> way and add a new column pg_class.relusercatalog which gets used
> instead of putting user_catalog_table in the reloptions, and
> propagated down to indexes.

Yeah, agree (see my previous point above).

> But I have a feeling that the reloptions
> code is not very well-structured to allow reloptions to be stored any
> place but in pg_class.reloptions, so this may be difficult to
> implement.

Why don't remove this "property" from reloptions? (would probably need much more changes that the current approach and probably take care of upgrade scenario too).
I did not look in details but logged/unlogged is also propagated to the indexes, so maybe we could use the same approach here. But is it worth the probably added complexity (as compare to the current approach)?

> Yet a third way is to have the index fetch the flag from
> the associated table, perhaps when the relcache entry is built. But I
> see no existing precedent for that in RelationInitIndexAccessInfo,
> which I think is where it would be if we had it -- and that makes me
> suspect that there might be good reasons why this isn't actually safe.
> So while I do not really like the approach of storing the same
> property in different ways for tables and for indexes, it's also not
> really obvious to me how to do better.
>

I share the same thought and that's why I ended up doing it that way.

> Regarding the new flags that have been added to various WAL records, I
> am a bit curious as to whether there's some way that we can avoid the
> need to carry this information through the WAL, but I don't understand
> why we don't need that now and do need that with this patch so it's
> hard for me to think about that question in an intelligent way. If we
> do need it, I think there might be cases where we should do something
> smarter than just adding bool onCatalogAccessibleInLogicalDecoding to
> the beginning of a whole bunch of WAL structs. In most cases we try to
> avoid having padding bytes in the WAL struct. If we can, we try to lay
> out the struct to avoid padding bytes. If we can't, we put the fields
> requiring less alignment at the end of the struct and then have a
> SizeOf<whatever> macro that is defined to not include the length of
> any trailing padding which the compiler would insert. See, for
> example, SizeOfHeapDelete. This patch doesn't do any of that, and it
> should. It should also consider whether there's a way to avoid adding
> any new bytes at all, e.g. it adds
> onCatalogAccessibleInLogicalDecoding to xl_heap_visible, but that
> struct has unused bits in 'flags'.

Thanks for the hints! I'll look at it.

>
> It would be very helpful if there were some place to refer to that
> explained the design decisions here, like why the feature we're trying
> to get requires this infrastructure around indexes to be added. It
> could be in the commit messages, an email message, a README, or
> whatever, but right now I don't see it anywhere in here.
>

Like adding something around those lines in the commit message?

"
On a primary database, any catalog rows that may be needed by a logical decoding replication slot are not removed.
This is done thanks to the catalog_xmin associated with the logical replication slot.

With logical decoding on standby, in the following cases:

- hot_standby_feedback is off
- hot_standby_feedback is on but there is no a physical slot between the primary and the standby. Then, hot_standby_feedback will work, but only while the connection is alive (for example a node restart would break it)

Then the primary may delete system catalog rows that could be needed by the logical decoding on the standby. Then, it’s mandatory to identify those rows and invalidate the slots that may need them if any.
"

[1]: https://www.postgresql.org/docs/current/logicaldecoding-output-plugin.html (Section 49.6.2. Capabilities)

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message PG Bug reporting form 2022-12-13 10:57:39 BUG #17717: Regression in vacuumdb (15 is slower than 10/11 and possible memory issue)
Previous Message Nitin Jadhav 2022-12-13 10:44:12 Re: Introduce a new view for checkpointer related stats