Re: Single transaction in the tablesync worker?

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Single transaction in the tablesync worker?
Date: 2021-02-08 10:59:25
Message-ID: CAA4eK1+MfqeYrj4N7=KMrmPE9fQNrk_FDba3H4vsHmpF-qyd5A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Feb 5, 2021 at 8:40 PM Petr Jelinek
<petr(dot)jelinek(at)enterprisedb(dot)com> wrote:
>
> Hi,
>
> We had a bit high-level discussion about this patches with Amit
> off-list, so I decided to also take a look at the actual code.
>

Thanks for the discussion and a follow-up review.

> My main concern originally was the potential for left-over slots on
> publisher, but I think the state now is relatively okay, with couple of
> corner cases that are documented and don't seem much worse than the main
> slot.
>
> I wonder if we should mention the max_slot_wal_keep_size GUC in the
> table sync docs though.
>

I have added the reference of this in Alter Subscription where we
mentioned the risk of leftover slots. Let me know if you have
something else in mind?

> Another thing that might need documentation is that the the visibility
> of changes done by table sync is not anymore isolated in that table
> contents will show intermediate progress to other backends, rather than
> switching from nothing to state consistent with rest of replication.
>

Agreed and updated the docs accordingly.

>
> Some minor comments about code:
>
> > + else if (res->status == WALRCV_ERROR && missing_ok)
> > + {
> > + /* WARNING. Error, but missing_ok = true. */
> > + ereport(WARNING,
>
> I wonder if we need to add error code to the WalRcvExecResult and check
> for the appropriate ones here. Because this can for example return error
> because of timeout, not because slot is missing.
>

I think there are both pros and cons of distinguishing the error
("slot doesnot exist" from others). The benefit is if there a network
glitch then the user can probably retry the commands Alter/Drop and it
will be successful next time. OTOH, say the network is broken for a
long time and the user wants to proceed but there won't be any way to
proceed for Alter Subscription ... Refresh or Drop Command. So by
giving WARNING at least we can provide a way to proceed and then they
can drop such slots later. We have mentioned this in docs as well. I
think we can go either way here, let me know what do you think is a
better way?

> Not sure if it matters
> for current callers though (but then maybe don't call the param
> missign_ok?).
>

Sure, if we decide not to change the behavior as suggested by you then
this makes sense.

>
> > +ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char syncslotname[NAMEDATALEN])
> > +{
> > + if (syncslotname)
> > + sprintf(syncslotname, "pg_%u_sync_%u", suboid, relid);
> > + else
> > + syncslotname = psprintf("pg_%u_sync_%u", suboid, relid);
> > +
> > + return syncslotname;
> > +}
>
> Given that we are now explicitly dropping slots, what happens here if we
> have 2 different downstreams that happen to get same suboid and reloid,
> will one of the drop the slot of the other one? Previously with the
> cleanup being left to temp slot we'd at maximum got error when creating
> it but with the new logic in LogicalRepSyncTableStart it feels like we
> could get into situation where 2 downstreams are fighting over slot no?
>

As discussed, added system_identifier to distinguish subscriptions
between different clusters.

Apart from fixing the above comment, I have integrated it with the new
replorigin_drop_by_name() API being discussed in the thread [1] and
posted that patch just for ease. I have also integrated Osumi-San's
test case patch with minor modifications.

[1] - https://www.postgresql.org/message-id/CAA4eK1L7mLhY%3DwyCB0qsEGUpfzWfncDSS9_0a4Co%2BN0GUyNGNQ%40mail.gmail.com

--
With Regards,
Amit Kapila.

Attachment Content-Type Size
v29-0001-Make-pg_replication_origin_drop-safe-against-con.patch application/octet-stream 5.1 KB
v29-0002-Allow-multiple-xacts-during-table-sync-in-logica.patch application/octet-stream 61.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-02-08 11:03:40 Re: Single transaction in the tablesync worker?
Previous Message Pavel Borisov 2021-02-08 10:46:18 [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.