Re: Trap errors from streaming child in pg_basebackup to exit early

From: Daniel Gustafsson <daniel(at)yesql(dot)se>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Magnus Hagander <magnus(at)hagander(dot)net>
Subject: Re: Trap errors from streaming child in pg_basebackup to exit early
Date: 2021-09-01 08:26:22
Message-ID: 1CCAA16A-4A58-4023-928C-598AB865FC56@yesql.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On 30 Aug 2021, at 12:31, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:

> Here are some comments on the patch:
> 1) Do we need volatile keyword here to read the value of the variables
> always from the memory?
> +static volatile sig_atomic_t bgchild_exited = false;

Yes, fixed.

> 2) Do we need #ifndef WIN32 ... #endif around sigchld_handler function
> definition?

Ah yes, good point. Fixed.

> 3) I'm not sure if the new value of bgchild_exited being set in the
> child thread will reflect in the main process on Windows? But
> theoretically, I can understand that the memory will be shared between
> the main process thread and child thread.

The child does not have it’s own copy of bgchild_exited.

> 4) How about "set the same flag that we use on Unix to signal
> SIGCHLD." instead of "* set the flag used on Unix to signal
> SIGCHLD."?

Fixed.

> 5) How about "background WAL receiver terminated unexpectedly" instead
> of "log streamer child terminated unexpectedly"? This will be in sync
> with the existing message "starting background WAL receiver". "log
> streamer" is the word used internally in the code, user doesn't know
> it with that name.

Good point, that’s better.

> 6) How about giving the exit code (like postmaster's reaper function
> does) instead of just a message saying unexpected termination? It will
> be useful to know for what reason the process exited. For Windows, we
> can use GetExitCodeThread (I'm referring to the code around waitpid in
> pg_basebackup) and for Unix we can use waitpid.

The rest of the program is doing exit(1) regardless of the failure of the
child/thread, so it seems more consistent to keep doing that for this class of
error as well.

A v2 with the above fixes is attached.

--
Daniel Gustafsson https://vmware.com/

Attachment Content-Type Size
v2-0001-Quick-exit-on-log-stream-child-exit-in-pg_basebac.patch application/octet-stream 3.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-09-01 08:31:22 Re: Returning to Postgres community work
Previous Message Amit Kapila 2021-09-01 08:24:00 Re: Failure of subscription tests with topminnow