From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart |
Date: | 2025-09-12 17:47:41 |
Message-ID: | CAD21AoDsOkZQPOp=fm4hYAeHffCcKwm8m2RnvA5h_q39A-GENA@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Sep 11, 2025 at 9:08 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Thu, Sep 11, 2025 at 11:16 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Wed, Sep 10, 2025 at 11:32 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > On Sat, Sep 6, 2025 at 3:46 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > >
> > > > I've attached the updated patch that incorporated all comments I got so far.
> > > >
> > >
> > > *
> > > + /*
> > > + * While all processes are using the new status, there could be some
> > > + * transactions that might have started with the old status. So wait
> > > + * for the running transactions to complete so that logical decoding
> > > + * doesn't include transactions that wrote WAL with insufficient
> > > + * information.
> > > + */
> > > + running = GetRunningTransactionData();
> > > + LWLockRelease(ProcArrayLock);
> > > + LWLockRelease(XidGenLock);
> > > +
> > > + elog(DEBUG1, "waiting for %d transactions to complete", running->xcnt);
> > > +
> > > + for (int i = 0; i < running->xcnt; i++)
> > > + {
> > > + TransactionId xid = running->xids[i];
> > > +
> > > + if (TransactionIdIsCurrentTransactionId(xid))
> > > + continue;
> > > +
> > > + XactLockTableWait(xid, NULL, NULL, XLTW_None);
> > > + }
> > >
> > > When building a snapshot during the start of logical decoding, we
> > > anyway wait for running transactions to finish via the snapbuild
> > > machinery. So, why do we need it here? And if it is needed, can we
> > > update the comments to explain why it is required in spite of
> > > snapbuild machinery doing similar thing?
> >
> > Fair point. I don't see any reason we need to wait here. Will remove this step.
> >
>
> We can add a comment there explaining why we don't wait for
> in-progress transactions. This will also be important if we miss
> anything and later need to handle it similarly.
Yes, I'll add comments.
>
> One thing related to this which needs a discussion is after this
> change, it is possible that part of the transaction contains
> additional logical_wal_info. I couldn't think of a problem due to this
> but users using pg_waldump or other WAL reading utilities could
> question this. One possibility is that we always start including
> logical_wal_info for the next new transaction but not sure if that is
> required. It would be good if other people involved in the discussion
> or otherwise could share their opinion on this point.
I believe it's safe to write logical information to WAL records even
when not strictly required, and it won't be a problem in practice.
FYI a similar thing is true for full page writes; full page writes
could be included in WAL records even when not strictly required (see
UpdateFullPageWrites() for details).
>
> > > * Is it a good idea to enable/disable decoding for temporary logical
> > > slots? The temporary slots are released during ERROR or at session
> > > end, is that a good time to do the disable processing that even
> > > requires WAL writing.
> >
> > I think the same is true for slots with RS_EPEMERAL state. Since it
> > could confuse users if automatic effective_wal_level change is
> > supported only for non-temporary slots, I personally would like not to
> > push aside temporary slots. I agree that it might not be a good time
> > to disable processing during process shutdown time; in addition to
> > requiring WAL record, it also requires waits for concurrent state
> > change processings while it holds all interrupts, which could easily
> > involve dead-locks.
> >
>
> Yes, all such processing during ERROR and shutdown sounds scary and a
> source for problems.
>
> > It might be worth considering doing the disable
> > process in a lazy way. For example, other processes (like
> > checkpointer) periodically checks the logical decoding status and
> > disables it if necessary.
> >
>
> Yeah, doing lazily sounds reasonable to me. We need to do lazily only
> for ERROR cases, otherwise, during a normal drop_slot, it may be okay.
> But OTOH, while dropping the slot as a part of subscription drop, it
> could be risky because if due to any reason, the disabling took more
> time, the subscription drop operation would look like hang or in worse
> case, the connection can time out.
True. I thought that disabling logical decoding in a synchronous way
is more preferable for users since it's guaranteed effective_wal_level
gets decreased to 'replica' when drop-slot completes. However, one
hypothesis is that users would not be interested in whether
effective_wal_level is 'replica' or 'logical' but in being able to
create logical slots even when wal_level is set to 'logical'. That is,
if we use the lazy disabling approach for all cases, users would have
to wait for effective_wal_level to be decreased to 'replica' if they
want to check. But if users don't check that often in practice, the
lazy approach would be a better way.
> For the shutdown sequence, can't we think of resetting effective_wal
> after a restart?
Does it mean that effective_wal_level keeps 'logical' until the next
server starts?
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2025-09-12 17:52:01 | Re: plan shape work |
Previous Message | Álvaro Herrera | 2025-09-12 17:46:16 | Re: ABI Compliance Checker GSoC Project |