Re: Trap errors from streaming child in pg_basebackup to exit early

From: Daniel Gustafsson <daniel(at)yesql(dot)se>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Trap errors from streaming child in pg_basebackup to exit early
Date: 2022-02-21 14:11:30
Message-ID: 2289827C-7462-4B47-AD18-0601FAD36143@yesql.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On 21 Feb 2022, at 03:03, Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> On Fri, Feb 18, 2022 at 10:00:43PM +0100, Daniel Gustafsson wrote:
>> This is good idea, I was going in a different direction earlier with a test but
>> this is cleaner. The attached 0001 refactors pump_until; 0002 fixes a trivial
>> spelling error found while hacking; and 0003 is the previous patch complete
>> with a test that passes on Cirrus CI.
>
> This looks rather sane to me, and I can confirm that this passes
> the CI and a manual run of MSVC tests with my own box.

Great, thanks!

> +is($node->poll_query_until('postgres',
> + "SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE " .
> + "application_name = '010_pg_basebackup.pl' AND wait_event =
> 'WalSenderMain' " .
> + "AND backend_type = 'walsender'"), "1", "Walsender killed");
> If you do that, don't you have a risk to kill the WAL sender doing the
> BASE_BACKUP? That could falsify the test. It seems to me that it
> would be safer to add a check on query ~ 'START_REPLICATION' or
> something like that.

I don't think there's a risk, but I've added the check on query as well since
it also makes it more readable.

> - diag("aborting wait: program timed out");
> - diag("stream contents: >>", $$stream, "<<");
> - diag("pattern searched for: ", $untl);
> Keeping some of this information around would be useful for
> debugging in the refactored routine.

Maybe, but we don't really have diag output anywhere in the modules or the
tests so I didn't see much of a precedent for keeping it. Inspectig the repo I
think we can remove two more in pg_rewind, which I just started a thread for.

> +my $sigchld_bb = IPC::Run::start(
> + [
> + @pg_basebackup_defs, '-X', 'stream', '-D', "$tempdir/sigchld",
> + '-r', '32', '-d', $node->connstr('postgres')
> + ],
> I would recommend the use of long options here as a matter to
> self-document what this does, and add a comment explaining why
> --max-rate is preferable, mainly for fast machines.

Fair enough, done.

--
Daniel Gustafsson https://vmware.com/

Attachment Content-Type Size
v6-0001-Add-function-to-pump-IPC-process-until-string-mat.patch application/octet-stream 9.0 KB
v6-0002-Remove-duplicated-word-in-comment.patch application/octet-stream 921 bytes
v6-0003-Quick-exit-on-log-stream-child-exit-in-pg_basebac.patch application/octet-stream 5.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Maxim Orlov 2022-02-21 14:14:00 Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.
Previous Message Daniel Gustafsson 2022-02-21 14:10:20 Using Test::More test functions for pg_rewind