Re: Single transaction in the tablesync worker?

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Ajin Cherian <itsajin(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Single transaction in the tablesync worker?
Date: 2021-02-05 05:22:54
Message-ID: CAA4eK1LQ4vQx44FE=mhmqhQa4Kz_CH8TnoTh8Q0q71kY-qhX=g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Feb 5, 2021 at 7:09 AM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> On Thu, Feb 4, 2021 at 8:33 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> ...
>
> > Thanks. I have fixed one of the issues reported by me earlier [1]
> > wherein the tablesync worker can repeatedly fail if after dropping the
> > slot there is an error while updating the SYNCDONE state in the
> > database. I have moved the drop of the slot just before commit of the
> > transaction where we are marking the state as SYNCDONE. Additionally,
> > I have removed unnecessary includes in tablesync.c, updated the docs
> > for Alter Subscription, and updated the comments at various places in
> > the patch. I have also updated the commit message this time.
> >
>
> Below are my feedback comments for V17 (nothing functional)
>
> ~~
>
> 1.
> V27 Commit message:
> For the initial table data synchronization in logical replication, we use
> a single transaction to copy the entire table and then synchronizes the
> position in the stream with the main apply worker.
>
> Typo:
> "synchronizes" -> "synchronize"
>

Fixed and added a note about Alter Sub .. Refresh .. command can't be
executed in the transaction block.

> ~~
>
> 2.
> @@ -48,6 +48,23 @@ ALTER SUBSCRIPTION <replaceable
> class="parameter">name</replaceable> RENAME TO <
> (Currently, all subscription owners must be superusers, so the owner checks
> will be bypassed in practice. But this might change in the future.)
> </para>
> +
> + <para>
> + When refreshing a publication we remove the relations that are no longer
> + part of the publication and we also remove the tablesync slots if there are
> + any. It is necessary to remove tablesync slots so that the resources
> + allocated for the subscription on the remote host are released. If due to
> + network breakdown or some other error, we are not able to remove the slots,
> + we give WARNING and the user needs to manually remove such slots later as
> + otherwise, they will continue to reserve WAL and might eventually cause
> + the disk to fill up. See also <xref
> linkend="logical-replication-subscription-slot"/>.
> + </para>
>
> I think the content is good, but the 1st-person wording seemed strange.
> e.g.
> "we are not able to remove the slots, we give WARNING and the user needs..."
> Maybe it should be like:
> "... PostgreSQL is unable to remove the slots, so a WARNING is
> reported. The user needs... "
>

Changed as per suggestion with a minor tweak.

> ~~
>
> 3.
> @@ -566,107 +569,197 @@ AlterSubscription_refresh(Subscription *sub,
> bool copy_data)
> ...
> + * XXX If there is a network break down while dropping the
>
> "network break down" -> "network breakdown"
>
> ~~
>
> 4.
> @@ -872,7 +970,48 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
> (errmsg("could not connect to the publisher: %s", err)));
> ...
> + * XXX We could also instead try to drop the slot, last time we failed
> + * but for that, we might need to clean up the copy state as it might
> + * be in the middle of fetching the rows. Also, if there is a network
> + * break down then it wouldn't have succeeded so trying it next time
> + * seems like a better bet.
>
> "network break down" -> "network breakdown"
>

Changed as per suggestion.

> ~~
>
> 5.
> @@ -269,26 +313,47 @@ invalidate_syncing_table_states(Datum arg, int
> cacheid, uint32 hashvalue)
> ...
> +
> + /*
> + * Cleanup the tablesync slot.
> + *
> + * This has to be done after updating the state because otherwise if
> + * there is an error while doing the database operations we won't be
> + * able to rollback dropped slot.
> + */
> + ReplicationSlotNameForTablesync(MyLogicalRepWorker->subid,
> + MyLogicalRepWorker->relid,
> + syncslotname);
> +
> + ReplicationSlotDropAtPubNode(wrconn, syncslotname, false /* missing_ok */);
> +
>
> Should this comment also describe why the missing_ok is false for this case?
>

Yeah that makes sense, so added a comment.

Additionally, I have changed the errorcode in RemoveSubscriptionRel,
moved the setup of origin before copy_table in
LogicalRepSyncTableStart to avoid doing copy again due to an error in
setting up origin. I have made a few comment changes as well.

--
With Regards,
Amit Kapila.

Attachment Content-Type Size
v28-0001-Allow-multiple-xacts-during-table-sync-in-logica.patch application/octet-stream 54.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hou, Zhijie 2021-02-05 05:23:49 RE: Parallel INSERT (INTO ... SELECT ...)
Previous Message Amit Langote 2021-02-05 05:14:59 Re: Parallel INSERT (INTO ... SELECT ...)