From: | Kevin K Biju <kevinkbiju(at)gmail(dot)com> |
---|---|
To: | Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> |
Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Add CHECK_FOR_INTERRUPTS in polling loop code path in XactLockTableWait |
Date: | 2025-05-26 19:43:59 |
Message-ID: | CAM45KeF6+3GtR7dDTAM-kwpGWpnUmAeg=jh0N3cd=NGBgvVkkA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Fujii,
Thanks for the review.
> Just an idea: how about calling pgstat_report_wait_start() and _end()
around pg_usleep(), similar to what WaitExceedsMaxStandbyDelay() does?
I was thinking about this as well, but the question is what do we report
the wait event here as? The one that makes sense to me is PG_WAIT_LOCK, but
that wouldn't align with what pg_locks would display since there is no
actual lock grabbed on the standby.
> Also, would waiting only 1ms per loop cycle be too aggressive?
I agree that it's aggressive for this case. But
XactLockTableWait/ConditionalXactLockTableWait seem to be used in other
places including in heapam so I'm hesitant on changing this behaviour for
all of them. Should we have a different "wait logic" for this case?
Kevin
On Mon, May 26, 2025 at 9:02 AM Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
wrote:
>
>
> On 2025/05/24 5:41, Kevin K Biju wrote:
> > Hi,
> >
> > While creating a logical replication slot, we wait for older
> transactions to complete to reach a "consistent point", which can take a
> while on busy databases. If we're creating a slot on a primary instance,
> it's pretty clear that we're waiting on a transaction:
> >
> >
> > postgres=# SELECT pid,wait_event_type,wait_event,state,query FROM
> pg_stat_activity WHERE pid=804;
> > pid | wait_event_type | wait_event | state |
> query
> >
> -----+-----------------+---------------+--------+----------------------------------------------------------------
> > 804 | Lock | transactionid | active | SELECT
> pg_create_logical_replication_slot('wow1', 'pgoutput');
> > (1 row)
> > postgres=# SELECT locktype,transactionid,pid,mode FROM pg_locks WHERE
> pid=804 AND granted='f';
> > locktype | transactionid | pid | mode
> > ---------------+---------------+-----+-----------
> > transactionid | 6077 | 804 | ShareLock
> >
> > However, creating a slot on a hot standby behaves very differently when
> blocked on a transaction. Querying pg_stat_activity doesn't give us any
> indication on what the issue is:
> >
> >
> > postgres=# SELECT pg_is_in_recovery();
> > pg_is_in_recovery
> > -------------------
> > t
> > (1 row)
> > postgres=# SELECT pid,wait_event_type,wait_event,state,query FROM
> pg_stat_activity WHERE pid=5074;
> > pid | wait_event_type | wait_event | state |
> query
> >
> ------+-----------------+------------+--------+----------------------------------------------------------------
> > 5074 | | | active | SELECT
> pg_create_logical_replication_slot('wow1', 'pgoutput');
> >
> > And more importantly, a backend in this state cannot be terminated via
> the usual methods and sometimes requires a server restart.
> >
> >
> > postgres=# SELECT pg_cancel_backend(5074), pg_terminate_backend(5074);
> > pg_cancel_backend | pg_terminate_backend
> > -------------------+----------------------
> > t | t
> > (1 row)
> > postgres=# SELECT pid,wait_event_type,wait_event,state,query FROM
> pg_stat_activity WHERE pid=5074;
> > pid | wait_event_type | wait_event | state |
> query
> >
> ------+-----------------+------------+--------+----------------------------------------------------------------
> > 5074 | | | active | SELECT
> pg_create_logical_replication_slot('wow1', 'pgoutput');
> >
> > I've taken a look at the code that "awaits" on transactions and the
> function of interest is lmgr.c/XactLockTableWait. On a primary, it is able
> to acquire a ShareLock on the xid and the lock manager does a good job on
> making this wait efficient as well as visible externally. However, on a
> standby, it cannot wait on the xid since it is not running the transaction.
> However, it knows the transaction is still running from KnownAssignedXids,
> and then ends up on this codepath:
> >
> > *
> > * Some uses of this function don't involve tuple visibility -- such
> > * as when building snapshots for logical decoding. It is possible to
> > * see a transaction in ProcArray before it registers itself in the
> > * locktable. The topmost transaction in that case is the same xid,
> > * so we try again after a short sleep. (Don't sleep the first time
> > * through, to avoid slowing down the normal case.)
> > */
> > if (!first)
> > pg_usleep(1000L);
> >
> > The attached comment suggests that this sleep was only meant to be hit a
> few times while we wait for the lock to appear so we can wait on it.
> However, in the hot standby case, this ends up becoming a polling loop
> since the lock will never appear. There is no interrupt processing in this
> loop either and so the only way out is to wait for the transaction on the
> primary to complete.
>
> I agree with your analysis. It makes sense to me.
>
>
> > Attached is a patch that adds CHECK_FOR_INTERRUPTS before the sleep in
> XactLockTableWait and ConditionalXactLockTableWait. This allows backends
> waiting on a transaction during slot creation on a standby to be
> interrupted.
>
> +1
>
> > It would also be nice if there was a way for users to tell what we're
> waiting on (maybe a different wait event?) but I'd like input on that.
>
> Just an idea: how about calling pgstat_report_wait_start() and _end()
> around
> pg_usleep(), similar to what WaitExceedsMaxStandbyDelay() does?
> Since this is more of an improvement than a bug fix, I think
> it would be better to make it as a separate patch from the CFI addition.
>
>
> Also, would waiting only 1ms per loop cycle be too aggressive? If so, maybe
> we could increase the sleep time gradually during recovery (but not beyond
> 1 second), again similar to WaitExceedsMaxStandbyDelay().
>
> Regards,
>
> --
> Fujii Masao
> NTT DATA Japan Corporation
>
>
From | Date | Subject | |
---|---|---|---|
Next Message | Eduard Stefes | 2025-05-26 20:24:46 | RE: Review/Pull Request: Adding new CRC32C implementation for IBM S390X |
Previous Message | Tom Lane | 2025-05-26 19:36:14 | Re: Fixing memory leaks in postgres_fdw |