Re: [PATCH] Add `truncate` option to subscription commands

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: David Christensen <david(at)endpoint(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Euler Taveira <euler(dot)taveira(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Add `truncate` option to subscription commands
Date: 2021-05-24 05:31:04
Message-ID: CAA4eK1JKdGFXWVfoGUEgrUGY0rL8mgnPDKZd0pA6Cdx9oy_mmA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, May 22, 2021 at 9:58 AM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> On Thu, Nov 26, 2020 at 12:16 AM David Christensen <david(at)endpoint(dot)com> wrote:
> >
> > Hi,
> >
> > At this time I do not have time to make the necessary changes for this
> > commitfest so I am voluntarily withdrawing this patch, but will
> > revisit at a future time.
>
> Hi,
>
> This feature looks useful in the sense that it avoids users having to
> manually lookup all the tables on all the subscribers for truncation
> (in case they want the subscriber tables to exactly sync with the
> publisher tables).
>
> I have gone through the prior discussions on this thread. IMO, we can
> always go ahead with TRUNCATE ... RESTRICT behavior to avoid some
> unnecessary truncation of subscriber local tables (if at all users
> have such tables) that can arise due to CASCADE option. It looks like
> there are some problems with the FK - PK dependencies. Below are my
> thoughts:
>
> 1) Whether a table the sync worker is trying to truncate is having any
> referencing (foreign key) tables on the subscriber? If yes, whether
> all the referencing tables are present in the list of subscription
> tables (output of fetch_table_list)? In this case, the sync worker is
> truncating the primary key/referenced table.
>
> One way to solve the above problem is by storing the table oids of the
> subscription tables (output of fetch_table_list) in a new column in
> the pg_subscription catalog (like subpublications text[] column). In
> the sync worker, before truncation of a table, use
> heap_truncate_find_FKs to get all the referencing tables of the given
> table and get all the subscription tables from the new pg_subscription
> column. If all the referencing tables exist in the subscription
> tables, then truncate the table, otherwise don't, just skip it.
>

Here, silently skipping doesn't seem like a good idea when the user
has asked to truncate the table. Shouldn't we allow it if the user has
provided say cascade with a truncate option?

> There
> can be a problem here if there are many subscription tables, the size
> of the new column in pg_susbcription can be huge. However, we can
> choose to store the table ids in this new column only when the
> truncate option is specified.
>
> Another way is to let each table sync worker scan the
> pg_subscription_rel to get all the relations that belong to a
> subscription. But I felt this was costly.
>

I feel it is better to use pg_subscription_rel especially because we
will do so when the user has given the truncate option and note that
we are already accessing it in sync worker for both reading and
writing. See LogicalRepSyncTableStart.

> 2) Whether a table the sync worker is trying to truncate is a
> referencing table for any of the subscriber tables that is not part of
> the subscription list of tables? In this case, the table the sync
> worker is truncating is the foreign key/referencing table.
>
> This isn't a problem actually, the sync worker can safely truncate the
> table. This is also inline with the current TRUNCATE command
> behaviour.
>
> 3) I think we should allow the truncate option with CREATE
> SUBSCRIPTION, ALTER SUBSCRIPTION ... REFRESH/SET/ADD PUBLICATION,
> basically wherever copy_data and refresh options can be specified. And
> there's no need to store the truncate option in the pg_subscription
> catalogue because we allow it to be specified with only DDLs.
>

makes sense.

One other problem discussed in this thread was what to do when the
same table is part of multiple subscriptions and the user has provided
a truncate option while operating on such a subscription.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-05-24 05:51:50 Re: Logical Replication - behavior of TRUNCATE ... CASCADE
Previous Message Greg Nancarrow 2021-05-24 05:21:44 Re: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump