Re: Single transaction in the tablesync worker?

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Craig Ringer <craig(dot)ringer(at)enterprisedb(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>
Subject: Re: Single transaction in the tablesync worker?
Date: 2020-12-08 05:27:39
Message-ID: CAA4eK1+QC74wRQmbYT+MmOs=YbdUjuq0_A9CBbVoQMB1Ryi-OA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Dec 7, 2020 at 2:21 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Mon, Dec 7, 2020 at 9:21 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Mon, Dec 7, 2020 at 6:20 AM Craig Ringer
> > <craig(dot)ringer(at)enterprisedb(dot)com> wrote:
> > >
> >
> > >>
> > >> I am not sure why but it seems acceptable to original authors that the
> > >> data of transactions are visibly partially during the initial
> > >> synchronization phase for a subscription.
> > >
> > >
> > > I don't think there's much alternative there.
> > >
> >
> > I am not sure about this. I think it is primarily to allow some more
> > parallelism among apply and sync workers. One primitive way to achieve
> > parallelism and don't have this problem is to allow apply worker to
> > wait till all the tablesync workers are in DONE state.
> >
>
> As the slot of apply worker is created before all the tablesync
> workers it should never miss any LSN which tablesync workers would
> have processed. Also, the table sync workers should not process any
> xact if the apply worker has not processed anything. I think tablesync
> currently always processes one transaction (because we call
> process_sync_tables at commit of a txn) even if that is not required
> to be in sync with the apply worker.
>

One more thing to consider here is that currently in tablesync worker,
we create a slot with CRS_USE_SNAPSHOT option which creates a
transaction snapshot on the publisher, and then we use the same
snapshot for a copy from the publisher. After this, when we try to
receive the data from the publisher using the same slot, it will be in
sync with the COPY. I think to keep the same consistency between COPY
and the data we receive from the publisher in this approach, we need
to export the snapshot while creating a slot in the apply worker by
using CRS_EXPORT_SNAPSHOT and then use the same snapshot by all the
tablesync workers doing the copy. In tablesync workers, we can use the
SET TRANSACTION SNAPSHOT command after "BEGIN READ ONLY ISOLATION
LEVEL REPEATABLE READ" to achieve it. That way the COPY will use the
same snapshot as is used for receiving the changes in apply worker and
the data will be in sync.

--
With Regards,
Amit Kapila.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2020-12-08 05:28:35 Re: [Proposal] Global temporary tables
Previous Message Kyotaro Horiguchi 2020-12-08 05:11:28 Re: [Patch] Optimize dropping of relation buffers using dlist