Re: logical decoding and replication of sequences

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: logical decoding and replication of sequences
Date: 2022-03-25 11:21:12
Message-ID: CAA4eK1KG2hj-=5y6Ufc67q8O2P=OgRA584DahK7Sz6UuQvf0qA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 25, 2022 at 3:56 PM Tomas Vondra
<tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>
>
> On 3/25/22 05:01, Amit Kapila wrote:
> > On Fri, Mar 25, 2022 at 3:29 AM Tomas Vondra
> > <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
> >>
> >> Pushed.
> >>
> >
> > Some of the comments given by me [1] don't seem to be addressed or
> > responded to. Let me try to say again for the ease of discussion:
> >
>
> D'oh! I got distracted by Petr's response to that message, and missed
> this part ...
>
> > * Don't we need some syncing mechanism between apply worker and
> > sequence sync worker so that apply worker skips the sequence changes
> > till the sync worker is finished, otherwise, there is a risk of one
> > overriding the values of the other? See how we take care of this for a
> > table in should_apply_changes_for_rel() and its callers. If we don't
> > do this for sequences for some reason then probably a comment
> > somewhere is required.
> >
>
> How would that happen? If we're effectively setting the sequence as a
> side effect of inserting the data, then why should we even replicate the
> sequence?
>

I was talking just about sequence values here, considering that some
sequence is just replicating based on nextval. I think the problem is
that apply worker might override what copy has done if an apply worker
is behind the sequence sync worker as both can run in parallel. Let me
try to take some theoretical example to explain this:

Assume, at LSN 10000, the value of sequence s1 is 10. Then by LSN
12000, the value of s1 becomes 20. Now, say copy decides to copy the
sequence value till LSN 12000 which means it will make the value as 20
on the subscriber, now, in parallel, apply worker can process LSN
10000 and make it again 10. Apply worker might end up redoing all
sequence operations along with some transactional ones where we
recreate the file. I am not sure what exact problem it can lead to but
I think we don't need to redo the work.

We'll have the problem later too, no?
>
> > * Don't we need explicit privilege checking before applying sequence
> > data as we do in commit a2ab9c06ea15fbcb2bfde570986a06b37f52bcca for
> > tables?
> >
>
> So essentially something like TargetPrivilegesCheck in the worker?
>

Right.

Few more comments:
==================
1.
@@ -636,7 +704,7 @@ CreatePublication(ParseState *pstate,
CreatePublicationStmt *stmt)
get_database_name(MyDatabaseId));

/* FOR ALL TABLES requires superuser */
- if (stmt->for_all_tables && !superuser())
+ if (for_all_tables && !superuser())
ereport(ERROR,
(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
errmsg("must be superuser to create FOR ALL TABLES
publication")));

Don't we need a similar check for 'for_all_schema' publications?

2.
+<varlistentry>
+<term>
+ Int8
+</term>
+<listitem>
+<para>
+ 1 if the sequence update is transactions, 0 otherwise.

Shall we say transactional instead of transactions?

3.
+/*
+ * Determine object type given the object type set for a schema.
+ */
+char
+pub_get_object_type_for_relkind(char relkind)

Shouldn't it be 'relation' instead of 'schema' at the end of the sentence?

4.
@@ -1739,13 +1804,13 @@ get_rel_sync_entry(PGOutputData *data,
Relation relation)
{
Oid schemaId = get_rel_namespace(relid);
List *pubids = GetRelationPublications(relid);
-
+ char objectType =
pub_get_object_type_for_relkind(get_rel_relkind(relid));

A few lines after this we are again getting relkind which is not a big
deal but OTOH there doesn't seem to be a need to fetch the same thing
twice from the cache.

5.
+
+ /* Check that user is allowed to manipulate the publication tables. */
+ if (sequences && pubform->puballsequences)

/tables/sequences

6.
+apply_handle_sequence(StringInfo s)
{
...
+
+ relid = RangeVarGetRelid(makeRangeVar(seq.nspname,
+ seq.seqname, -1),
+ RowExclusiveLock, false);
...
}

As here, we are using missing_ok, if the sequence doesn't exist, it
will give a message like: "ERROR: relation "public.s1" does not
exist" whereas for tables we give a slightly more clear message like:
"ERROR: logical replication target relation "public.t1" does not
exist". This is handled via logicalrep_rel_open().

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2022-03-25 11:59:42 Re: logical decoding and replication of sequences
Previous Message Michael Paquier 2022-03-25 11:18:31 Re: Allow file inclusion in pg_hba and pg_ident files