From: | Dilip Kumar <dilipbalaut(at)gmail(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication |
Date: | 2022-07-06 10:40:10 |
Message-ID: | CAFiTN-tN3ya3PEnqZVLDWN=v68bRriPks_6zkVZrC-vw8QjAcg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Jul 6, 2022 at 2:48 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Wed, Jul 6, 2022 at 1:47 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> >
> > On Wed, Jul 6, 2022 at 9:06 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > How would you choose the slot name for the table sync, right now it
> > > contains the relid of the table for which it needs to perform sync?
> > > Say, if we ignore to include the appropriate identifier in the slot
> > > name, we won't be able to resue/drop the slot after restart of table
> > > sync worker due to an error.
> >
> > I had a quick look into the patch and it seems it is using the worker
> > array index instead of relid while forming the slot name, and I think
> > that make sense, because now whichever worker is using that worker
> > index can reuse the slot created w.r.t that index.
> >
>
> I think that won't work because each time on restart the slot won't be
> fixed. Now, it is possible that we may drop the wrong slot if that
> state of copying rel is SUBREL_STATE_DATASYNC.
So it will drop the previous slot the worker at that index was using,
so it is possible that on that slot some relation was at
SUBREL_STATE_FINISHEDCOPY or so and we will drop that slot. Because
now relid and replication slot association is not 1-1 so it would be
wrong to drop based on the relstate which is picked by this worker.
In short it makes sense what you have pointed out.
Also, it is possible
> that while creating a slot, we fail because the same name slot already
> exists due to some other worker which has created that slot has been
> restarted. Also, what about origin_name, won't that have similar
> problems? Also, if the state is already SUBREL_STATE_FINISHEDCOPY, if
> the slot is not the same as we have used in the previous run of a
> particular worker, it may start WAL streaming from a different point
> based on the slot's confirmed_flush_location.
Yeah this is also true, when a tablesync worker has to do catch up
after completing the copy then it might stream from the wrong lsn.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2022-07-06 11:39:30 | Re: Handle infinite recursion in logical replication setup |
Previous Message | Peter Eisentraut | 2022-07-06 10:30:49 | Re: automatically generating node support functions |