From: | Alexander Lakhin <exclusion(at)gmail(dot)com> |
---|---|
To: | shveta malik <shveta(dot)malik(at)gmail(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: How abnormal server shutdown could be detected by tests? |
Date: | 2023-12-12 15:00:00 |
Message-ID: | 5921355f-4cfb-c91a-24b8-6bbde53c990c@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello Shveta,
12.12.2023 11:44, shveta malik wrote:
>
>> The postmaster process exits with exit code 1, but pg_ctl can't get the
>> code and just reports that stop was completed successfully.
>>
> For what it's worth, there is another thread which stated the similar problem:
> https://www.postgresql.org/message-id/flat/2366244.1651681550%40sss.pgh.pa.us
>
Thank you for the reference!
So I refreshed a first part of the question Tom Lane raised before...
I've made a quick experiment with leaving postmaster.pid intact in case of
abnormal shutdown:
@@ -1113,6 +1113,7 @@ UnlinkLockFiles(int status, Datum arg)
{
char *curfile = (char *) lfirst(l);
+if (strcmp(curfile, DIRECTORY_LOCK_FILE) != 0 || status == 0)
unlink(curfile);
/* Should we complain if the unlink fails? */
}
and `make check-world` passed for me with no failure.
(In the meantime, the assertion failure forced as above is detected.)
Though there is a minor issue with a couple of tests. Namely,
003_recovery_targets.pl does the following:
# wait for the error message in the standby log
foreach my $i (0 .. 10 * $PostgreSQL::Test::Utils::timeout_default)
{
$logfile = slurp_file($node_primary->logfile());
$res = ($logfile =~
qr/FATAL: .* recovery ended before configured recovery target was reached/);
if ($res) {
last;
}
usleep(100_000);
}
ok($res,
'recovery end before target reached is a fatal error');
With postmaster.pid left after unclean shutdown, the test waits for 300
seconds by default and then completes successfully.
If rewrite that loop as follows:
# wait for the error message in the standby log
foreach my $i (0 .. 10 * $PostgreSQL::Test::Utils::timeout_default)
{
$logfile = slurp_file($node_primary->logfile());
$res = ($logfile =~
qr/FATAL: .* recovery ended before configured recovery target was reached/);
if ($res) {
last;
}
usleep(100_000);
}
ok($res,
'recovery end before target reached is a fatal error');
the test completes as quickly as before.
(standby.log is only 2kb, so rereading it isn't a big deal, IMO)
So maybe it's the way to go?
Another way I can think of is sending some signal to pg_ctl in case
postmaster terminates with status 0. Though I think it would complicate
things a little as it allows for three different states:
postmaster.pid preserved (in case postmaster killed with -9),
postmaster.pid removed and the signal received/not received.
Best regards,
Alexander
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2023-12-12 15:18:00 | Re: Add --check option to pgindent |
Previous Message | Xiaoran Wang | 2023-12-12 14:37:54 | Re: [PATCH]: Not to invaldiate CatalogSnapshot for local invalidation messages |