Re: Can a child process detect postmaster death when in pg_usleep?

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Can a child process detect postmaster death when in pg_usleep?
Date: 2021-04-15 06:18:50
Message-ID: CALj2ACX=pWymzbHOV2RHLWAzZ_a8Pv894Nj-miM4GNwv25WHvg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 15, 2021 at 5:28 AM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:

Thanks a lot for the detailed explanation.

> On Thu, Apr 15, 2021 at 2:06 AM Bharath Rupireddy
> <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> > 1) Is it really harmful to use pg_usleep in a postmaster child process
> > as it doesn't let the child process detect postmaster death?
>
> Yeah, that's a bad idea. Any long-term waiting (including short waits
> in a loop) should ideally be done with the latch infrastructure.

Agree. Along with short waits in a loop, I think we also should
replace pg_usleep with WaitLatch that has a user configurable
parameter like below:

pg_usleep(VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL * 1000L);
pg_usleep(PostAuthDelay * 1000000L);
pg_usleep(CommitDelay);

> 4) Are there any places where we need to replace pg_usleep with
> > WaitLatch/equivalent of pg_sleep to detect the postmaster death
> > properly?
>
> We definitely have replaced a lot of sleeps with latch.c primitives
> over the past few years, since we got WL_EXIT_ON_PM_DEATH and
> condition variables. There may be many more to improve... You
> mentioned autovacuum... yeah, Stephen fixed one of these with commit
> 4753ef37, but yeah it's not great to have those others in there...

I have not looked at the commit 4753ef37 previously, but it
essentially addresses the problem with pg_usleep for vacuum delay. I'm
thinking we can also replace pg_usleep in below places based on the
fact that pg_usleep should be avoided in 1) short waits in a loop 2)
when wait time is dependent on user configurable parameters. And using
WaitLatch may require us to add wait event types to WaitEventTimeout
enum, but that's okay.

1) pg_usleep(VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL * 1000L); in lazy_truncate_heap
2) pg_usleep(CommitDelay); in XLogFlush
3) pg_usleep(10000L); in CreateCheckPoint
4) pg_usleep(1000000L); in do_pg_stop_backup
5) pg_usleep(1000L); in read_local_xlog_page
6) pg_usleep(PostAuthDelay * 1000000L); in AutoVacLauncherMain,
AutoVacWorkerMain, StartBackgroundWorker, InitPostgres
7) pg_usleep(100000L); in RequestCheckpoint
8) pg_usleep(1000000L); in pgarch_ArchiverCopyLoop
9) pg_usleep(PGSTAT_RETRY_DELAY * 1000L); in backend_read_statsfile
10) pg_usleep(PreAuthDelay * 1000000L); in BackendInitialize
11) pg_usleep(10000L); in WalSndWaitStopping
12) pg_usleep(standbyWait_us); in WaitExceedsMaxStandbyDelay
13) pg_usleep(10000L); in RegisterSyncRequest

I'm sure we won't be changing in all of the above places. It will be
good to review and correct the above list.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-04-15 06:22:38 Re: Replication slot stats misgivings
Previous Message Michael Paquier 2021-04-15 04:57:43 Re: File truncation within PostgresNode::issues_sql_like() wrong on Windows