Re: [PATCH] Add `truncate` option to subscription commands

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: David Christensen <david(at)endpoint(dot)com>
Cc: Rahila Syed <rahilasyed90(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Euler Taveira <euler(dot)taveira(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Add `truncate` option to subscription commands
Date: 2021-05-22 04:28:20
Message-ID: CALj2ACUOkYk9WBHLmqEp9JDM333jZXCvqNQnS=0ukRdnTOcftg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Nov 26, 2020 at 12:16 AM David Christensen <david(at)endpoint(dot)com> wrote:
>
> Hi,
>
> At this time I do not have time to make the necessary changes for this
> commitfest so I am voluntarily withdrawing this patch, but will
> revisit at a future time.

Hi,

This feature looks useful in the sense that it avoids users having to
manually lookup all the tables on all the subscribers for truncation
(in case they want the subscriber tables to exactly sync with the
publisher tables).

I have gone through the prior discussions on this thread. IMO, we can
always go ahead with TRUNCATE ... RESTRICT behavior to avoid some
unnecessary truncation of subscriber local tables (if at all users
have such tables) that can arise due to CASCADE option. It looks like
there are some problems with the FK - PK dependencies. Below are my
thoughts:

1) Whether a table the sync worker is trying to truncate is having any
referencing (foreign key) tables on the subscriber? If yes, whether
all the referencing tables are present in the list of subscription
tables (output of fetch_table_list)? In this case, the sync worker is
truncating the primary key/referenced table.

One way to solve the above problem is by storing the table oids of the
subscription tables (output of fetch_table_list) in a new column in
the pg_subscription catalog (like subpublications text[] column). In
the sync worker, before truncation of a table, use
heap_truncate_find_FKs to get all the referencing tables of the given
table and get all the subscription tables from the new pg_subscription
column. If all the referencing tables exist in the subscription
tables, then truncate the table, otherwise don't, just skip it. There
can be a problem here if there are many subscription tables, the size
of the new column in pg_susbcription can be huge. However, we can
choose to store the table ids in this new column only when the
truncate option is specified.

Another way is to let each table sync worker scan the
pg_subscription_rel to get all the relations that belong to a
subscription. But I felt this was costly.

2) Whether a table the sync worker is trying to truncate is a
referencing table for any of the subscriber tables that is not part of
the subscription list of tables? In this case, the table the sync
worker is truncating is the foreign key/referencing table.

This isn't a problem actually, the sync worker can safely truncate the
table. This is also inline with the current TRUNCATE command
behaviour.

3) I think we should allow the truncate option with CREATE
SUBSCRIPTION, ALTER SUBSCRIPTION ... REFRESH/SET/ADD PUBLICATION,
basically wherever copy_data and refresh options can be specified. And
there's no need to store the truncate option in the pg_subscription
catalogue because we allow it to be specified with only DDLs.

4) If there are a huge number of tables with lots of data, then all
the sync workers will have to spend an extra amount of time in
truncating the tables. At times the publications can use "FOR ALL
TABLES" i.e. all the tables within a database, so truncating all of
them on the subscriber would be a time consuming task. I'm not sure if
this is okay.

5) We can choose to skip the errors that arise out of
ExecuteTruncateGuts in a sync worker using PG_TRY/PG_CATCH or changing
ExecuteTruncateGuts API to return false on error instead of emitting
an error.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2021-05-22 04:31:25 Re: Logical Replication - behavior of TRUNCATE ... CASCADE
Previous Message Amit Kapila 2021-05-22 03:58:07 Re: "Multiple table synchronizations are processed serially" still happens