Re: BUG #16154: pg_ctl restart with a logfile fails sometimes (on Windows)

From: Alexander Lakhin <exclusion(at)gmail(dot)com>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #16154: pg_ctl restart with a logfile fails sometimes (on Windows)
Date: 2019-12-06 08:00:01
Message-ID: e5179494-715e-f8a3-266b-0cf52adac8f4@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

06.12.2019 10:00, PG Bug reporting form wrote:
> When performing regression tests on Windows intermittent failures are
> observed, e.g. in src/bin/pg_basebackup test:
> vcregress taptest src/bin/pg_basebackup
> ...
> t/010_pg_basebackup.pl ... 10/106 Bailout called. Further testing stopped:
> system pg_ctl failed
> FAILED--Further testing stopped: system pg_ctl failed
>
>
> waiting for server to start....The process cannot access the file because it
> is being used by another process.
> stopped waiting
> pg_ctl: could not start server
>
> The issue is caused by sporadic "pg_ctl ... restart -l logfile" failures.

To reproduce this issue reliably I propose the simple demo patch
(delay_after_unlink_pid p, li { white-space: pre-wrap;).
With the delay added the "pg_ctl ... restart -l logfile" command (and
"vcregress taptest src/bin/pg_basebackup") fails always.
Error message is not very informational, but debugging shows that the
file in question is the log file, specified when running the command:
"C:\Windows\system32\cmd.exe" /C
""C:/src/postgresql/tmp_install/bin/postgres.exe" -D
"C:/src/postgresql/src/bin/pg_basebackup/tmp_check/t_010_pg_basebackup_main_data/pgdata"
--cluster-name=main < "nul" >>
"/C:/src/postgresql/src/bin/pg_basebackup/tmp_check/log/010_pg_basebackup_main.log/"
2>&1"

If this file is still opened by the previous server shell (it can happen
when the previous server instance has unlinked it's pid file, but it's
CMD shell is still running), the next CMD start fails with the
aforementioned error message.

To fix this issue I propose the attached patch
(fix_logfile_sharing_violation ).
With the patch, pg_ctl will wait for the log file to become available
(for 30 seconds). And if the file still could not be opened (it can be
reproduced with a larger delay in the demo patch), you'll get more
meaningful message:
/pg_ctl: could not access log file
"C:/src/postgresql/src/bin/pg_basebackup/tmp_check/log/010_pg_basebackup_main.log":
Permission denied/

Best regards,
Alexander

Attachment Content-Type Size
delay_after_unlink_pid.patch text/x-patch 393 bytes
fix_logfile_sharing_violation.patch text/x-patch 923 bytes

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2019-12-06 08:26:36 BUG #16155: error when starting pgAdmin (version 4)
Previous Message RideNext 2019-12-06 07:34:13 RE: Postgres takes more than 6 minutes to come up during host/standby switch over