Re: logical decoding and replication of sequences

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>
Subject: Re: logical decoding and replication of sequences
Date: 2022-03-14 00:46:03
Message-ID: 6b21233e-c880-e4ad-5df2-7832f769296c@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3/13/22 07:45, Peter Eisentraut wrote:
> Further review (based on 20220310 patch):
>
>  doc/src/sgml/ref/create_publication.sgml       |   3 +
>
> For the clauses added to the synopsis, descriptions should be added
> below.  See attached patch for a start.

Thanks. I'm not sure what other improvements do you think this .sgml
file needs?

>
>  src/backend/commands/sequence.c                |  79 ++
>
> There is quite a bit of overlap between ResetSequence() and
> ResetSequence2(), but I couldn't see a good way to combine them that
> genuinely saves code and complexity.  So maybe it's ok.
>
> Actually, ResetSequence2() is not really "reset", it's just "set".
> Maybe pick a different function name.

Yeah, good point. I think the functions are sufficiently different, and
attempting to remove the publications is unlikely to be an improvement.
But you're right "ResetSequence2" is not a great name, so I've changed
it to "SetSequence".

>
>  src/backend/commands/subscriptioncmds.c        | 272 +++++++
>
> The code added in AlterSubscription_refresh() seems to be entirely
> copy-and-paste from the tables case.  I think this could be combined
> by concatenating the lists from fetch_table_list() and
> fetch_sequence_list() and looping over it once.  The same also applies
> to CreateSubscription(), although the code duplication is smaller
> there.
>
> This in turn means that fetch_table_list() and fetch_sequence_list()
> can be combined, so that you don't actually need any extensive new
> code in CreateSubscription() and AlterSubscription_refresh() for
> sequences.  This could go on, you can combine more of the underlying
> code, like pg_publication_tables and pg_publication_sequences and so
> on.
>

I've removed the duplicated code in both places, so that it processes
only a single list, which is a combination of tables and sequences
(using list_concat). For CreateSubscription it was trivial, because the
code was simple and perfect copy. For _refresh there was a minor
difference, but I think it was actually entirely unnecessary when
processing the combined list. But this will need more testing.

I'm not sure about the last bit, though. How would you combine code for
pg_publication_tables and pg_publication_sequences, etc?

>  src/backend/replication/logical/proto.c        |  52 ++
>
> The documentation of the added protocol message needs to be added to
> the documentation.  See attached patch for a start.
>

OK. I've resolved the FIXME for LSN. Not sure what else is needed?

> The sequence message does not contain the sequence Oid, unlike the
> relation message.  Would that be good to add?

I don't think we need to do that. For relations we do that because it
serves as an identifier in RelationSyncCache, and it links the various
messages to it. For sequences we don't need that - the schema is fixed.

Or do you see a practical reason to add the OID?

>
>  src/backend/replication/logical/worker.c       |  56 ++
>
> Maybe the Asserts in apply_handle_sequence() should be elogs.  These
> are checking what is sent over the network, so we don't want a
> bad/evil peer able to trigger asserts.  And in non-assert builds these
> conditions would be unchecked.
>

I'll think about it, but AFAIK we don't really assume evil peers.

>  src/backend/replication/pgoutput/pgoutput.c    |  82 +-
>
> I find the the in get_rel_sync_entry() confusing.  You add a section for
>
> if (!publish && is_sequence)
>
> but then shouldn't the code below that be something like
>
> if (!publish && !is_sequence)
>

Hmm, maybe. But I think there's actually a bigger issue - this does not
seem to be dealing with pg_publication_namespace.pnsequences correctly.
That is, we we don't differentiate which schemas include tables and
which schemas include sequences. Interestingly, no tests fail. I'll take
a closer look tomorrow.

>  src/bin/pg_dump/t/002_pg_dump.pl               |  38 +-
>
> This adds a new publication "pub4", but the tests already contain a
> "pub4".  I'm not sure why this even works, but perhaps the new one
> shold be "pub5", unless there is a deeper meaning.
>

I agree, pub5 it is. But it's interesting it does not fail even with the
duplicate name. Strange.

>  src/include/catalog/pg_publication_namespace.h |   3 +-
>
> I don't like how the distinction between table and sequence is done
> using a bool field.  That affects also the APIs in pg_publication.c
> and publicationcmds.c especially.  There is a lot of unadorned "true"
> and "false" being passed around that isn't very clear, and it all
> appears to originate at this catalog.  I think we could use a char
> field here that uses the relkind constants.  That would also make the
> code in pg_publication.c etc. slightly clearer.
>

I thought about using relkind, but it does not work all that nicely
because we have multiple relkinds for a table (because of partitioned
tables). So I found that confusing.

Maybe we should just use 'r' for any table, in this catalog?

>
> See attached patch for more small tweaks.
>
> Your patch still contains a number of XXX and FIXME comments, which in
> my assessment are all more or less correct, so I didn't comment on those
> separately.
>

Yeah, I plan to look at those next.

> Other than that, this seems pretty good.
>
> Earlier in the thread I commented on some aspects of the new grammar
> (e.g., do we need FOR ALL SEQUENCES?).  I think this would be useful to
> review again after all the new logical replication patches are in.  I
> don't want to hold up this patch for that at this point.

I'm not particularly attached to the grammar, but I don't see any reason
not to have mostly the same grammar/options as for tables.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
0001-Add-support-for-decoding-sequences-to-built-20220313.patch text/x-patch 220.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2022-03-14 01:44:32 Re: Add index scan progress to pg_stat_progress_vacuum
Previous Message Kyotaro Horiguchi 2022-03-14 00:44:22 Re: BufferAlloc: don't take two simultaneous locks