Re: [PATCH] Add `truncate` option to subscription commands

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: David Christensen <david(at)endpoint(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Euler Taveira <euler(dot)taveira(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Add `truncate` option to subscription commands
Date: 2021-05-25 09:46:44
Message-ID: CAA4eK1JfGACfcE8qC0FjFeQbTBKzHm4VpJ=rqTcOJN7REmBvTA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, May 24, 2021 at 2:10 PM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> On Mon, May 24, 2021 at 11:01 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > 1) Whether a table the sync worker is trying to truncate is having any
> > > referencing (foreign key) tables on the subscriber? If yes, whether
> > > all the referencing tables are present in the list of subscription
> > > tables (output of fetch_table_list)? In this case, the sync worker is
> > > truncating the primary key/referenced table.
> > >
> > > One way to solve the above problem is by storing the table oids of the
> > > subscription tables (output of fetch_table_list) in a new column in
> > > the pg_subscription catalog (like subpublications text[] column). In
> > > the sync worker, before truncation of a table, use
> > > heap_truncate_find_FKs to get all the referencing tables of the given
> > > table and get all the subscription tables from the new pg_subscription
> > > column. If all the referencing tables exist in the subscription
> > > tables, then truncate the table, otherwise don't, just skip it.
> > >
> >
> > Here, silently skipping doesn't seem like a good idea when the user
> > has asked to truncate the table. Shouldn't we allow it if the user has
> > provided say cascade with a truncate option?
>
> We could do that. In that case, the truncate option just can't be a
> boolean, but it has to be an enum accepting "restrict", "cascade",
> maybe "restart identity" or "continue identity" too. I have a concern
> here - what if the ExecuteTruncateGuts fails with the cascade option
> for whatever reason? Should the table sync worker be trapped in that
> error?
>

How is it any different from any other error we got during table sync
(say PK violation, out of memory, or any other such error)?

>
> > > There
> > > can be a problem here if there are many subscription tables, the size
> > > of the new column in pg_susbcription can be huge. However, we can
> > > choose to store the table ids in this new column only when the
> > > truncate option is specified.
> > >
> > > Another way is to let each table sync worker scan the
> > > pg_subscription_rel to get all the relations that belong to a
> > > subscription. But I felt this was costly.
> > >
> >
> > I feel it is better to use pg_subscription_rel especially because we
> > will do so when the user has given the truncate option and note that
> > we are already accessing it in sync worker for both reading and
> > writing. See LogicalRepSyncTableStart.
>
> Note that in pg_subscription_rel, there can exist multiple rows for
> each table for a given subscription. Say, t1 is a table that the sync
> worker is trying to truncate and copy. Say, t1_dep1, t1_dep2, t1_dep3
> .... are the dependent tables (we can find these using
> heap_truncate_find_FKs). Now, we need to see if all the t1_dep1,
> t1_dep2, t1_dep3 .... tables are in the pg_suscription_rel with the
> same subscription id, then only we can delete all of them with
> EexecuteTruncateGuts() using cascade option. If any of the t1_depX is
> either not in the pg_subscription_rel or it is subscribed in another
> subscription, then is it okay if we scan pg_suscription_rel in a loop
> with t1_depX relid's?
>

Why do you need to search in a loop? There is an index for relid, subid.

> Isn't it costiler? Or since it is a cache
> lookup, maybe that's okay?
>
> /* Try finding the mapping. */
> tup = SearchSysCache2(SUBSCRIPTIONRELMAP,
> ObjectIdGetDatum(relid),
> ObjectIdGetDatum(subid));
>
> > One other problem discussed in this thread was what to do when the
> > same table is part of multiple subscriptions and the user has provided
> > a truncate option while operating on such a subscription.
>
> I think we can just skip truncating a table when it is part of
> multiple subscriptions. We can tell this clearly in the documentation.
>

Okay, I don't have any better ideas at this stage.

--
With Regards,
Amit Kapila.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-05-25 09:47:44 Re: Logical Replication - behavior of TRUNCATE ... CASCADE
Previous Message Amit Kapila 2021-05-25 09:30:55 Re: Replication slot stats misgivings