Re: [PATCH] Add `truncate` option to subscription commands

From: Rahila Syed <rahilasyed90(at)gmail(dot)com>
To: David Christensen <david(at)endpoint(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Euler Taveira <euler(dot)taveira(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Add `truncate` option to subscription commands
Date: 2020-10-28 18:06:19
Message-ID: CAH2L28vrYCKuYO==BYxVYJwOcV=43Q=869xZbtbbT89o_o+cFg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi David,

The feature seems useful to me. The code will need to be refactored due to
changes in commit : b05fe7b442

Please see the following comments.
1. Is there a specific reason behind having new relstate for truncate?
The current state flow is
INIT->DATATSYNC->SYNCWAIT->CATCHUP->SYNCDONE->READY.
Can we accommodate the truncate in either INIT or DATASYNC?

2. + StartTransactionCommand();
+ rel =
table_open(MyLogicalRepWorker->relid, RowExclusiveLock);
+
+ rels = lappend(rels, rel);
+ relids = lappend_oid(relids,
MyLogicalRepWorker->relid);
+
+ ExecuteTruncateGuts(rels, relids,
NIL, DROP_RESTRICT, false);
+ CommitTransactionCommand();

Truncate is being performed in a separate transaction as data copy, I think
that leaves a window
open for concurrent transactions to modify the data after truncate and
before copy.

3. Regarding the truncate of the referenced table, I think one approach can
be to perform the following:
i. lock the referencing and referenced tables against writes
ii. drop the foriegn key constraints,
iii.truncate
iv. sync
v. recreate the constraints
vi. release lock.
However, I am not sure of the implications of locking these tables on the
main apply process.

Thank you,

On Mon, Oct 12, 2020 at 11:31 PM David Christensen <david(at)endpoint(dot)com>
wrote:

> > On Oct 11, 2020, at 10:00 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
> >
> > On Mon, Oct 12, 2020 at 3:44 AM David Christensen <david(at)endpoint(dot)com>
> wrote:
> >>
> >>
> >> On Oct 11, 2020, at 1:14 PM, Euler Taveira <
> euler(dot)taveira(at)2ndquadrant(dot)com> wrote:
> >>
> >> 
> >> On Fri, 9 Oct 2020 at 15:54, David Christensen <david(at)endpoint(dot)com>
> wrote:
> >>>
> >>>
> >>> Enclosed find a patch to add a “truncate” option to subscription
> commands.
> >>>
> >>> When adding new tables to a subscription (either via `CREATE
> SUBSCRIPTION` or `REFRESH PUBLICATION`), tables on the target which are
> being newly subscribed will be truncated before the data copy step. This
> saves explicit coordination of a manual `TRUNCATE` on the target tables and
> allows the results of the initial data sync to be the same as on the
> publisher at the time of sync.
> >>>
> >>> To preserve compatibility with existing behavior, the default value
> for this parameter is `false`.
> >>>
> >>
> >> Truncate will fail for tables whose foreign keys refer to it. If such a
> feature cannot handle foreign keys, the usefulness will be restricted.
> >>
> >>
> >> This is true for existing “truncate” with FKs, so doesn’t seem to be
> any different to me.
> >>
> >
> > What would happen if there are multiple tables and truncate on only
> > one of them failed due to FK check? Does it give an error in such a
> > case, if so will the other tables be truncated?
>
> Currently each SyncRep relation is sync’d separately in its own worker
> process; we are doing the truncate at the initialization step of this, so
> it’s inherently in its own transaction. I think if we are going to do any
> sort of validation on this, it would have to be at the point of the CREATE
> SUBSCRIPTION/REFRESH PUBLICATION where we have the relation list and can do
> sanity-checking there.
>
> Obviously if someone changes the schema at some point between when it does
> this and when relation syncs start there is a race condition, but the same
> issue would affect other data sync things, so I don’t care to solve that as
> part of this patch.
>
> >> Hypothetically if you checked all new tables and could verify if there
> were FK cycles only already in the new tables being added then “truncate
> cascade” would be fine. Arguably if they had existing tables that were part
> of an FK that wasn’t fully replicated they were already operating brokenly.
> >>
> >
> > I think if both PK_table and FK_table are part of the same
> > subscription then there should be a problem as both them first get
> > truncated? If they are part of a different subscription (or if they
> > are not subscribed due to whatever reason) then probably we need to
> > deal such cases carefully.
>
> You mean “should not be a problem” here? If so, I agree with that.
> Obviously if we determine this features is only useful with this support
> we’d have to chase the entire dependency graph and make sure that is all
> contained in the set of newly-subscribed tables (or at least FK referents).
>
> I have not considered tables that are part of more than one subscription
> (is that possible?); we presumably should error out if any table exists
> already in a separate subscription, as we’d want to avoid truncating tables
> already part of an existing subscription.
>
> While I’m happy to take a stab at fixing some of the FK/PK issues, it
> seems easy to go down a rabbit hole. I’m not convinced that we couldn’t
> just detect FK issues and choose to not handle this case without decreasing
> the utility for at least some cases. (Perhaps we could give a hint as to
> the issues detected to point someone in the right direction.) Anyway, glad
> to keep discussing potential implications, etc.
>
> Best,
>
> David
> --
> David Christensen
> Senior Software and Database Engineer
> End Point Corporation
> david(at)endpoint(dot)com
> 785-727-1171
>
>
>
>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message John Naylor 2020-10-28 18:08:28 Re: duplicate function oid symbols
Previous Message Andres Freund 2020-10-28 17:57:44 Re: cutting down the TODO list thread