Re: Still another race condition in recovery TAP tests

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Still another race condition in recovery TAP tests
Date: 2017-09-11 01:11:27
Message-ID: CAB7nPqSnPRk-L_2bXdm0tOFiphZf1a=N+-+j=X2A9Bva=AZ_eA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Sep 10, 2017 at 12:38 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Michael Paquier <michael(dot)paquier(at)gmail(dot)com> writes:
>> On Sat, Sep 9, 2017 at 1:43 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Yeah, even if we fixed this particular call site, I'm sure the issue
>>> would come up again. Certainly we expect hot backups to work with
>>> a changing source directory.
>
>> In short, I'd still like to keep RecursiveCopy for now, but change its
>> code so as a copy() is not a hard failure. What do you think?
>
> The specific case we need to allow is "ENOENT on a file/dir that was
> there a moment ago". I think it still behooves us to complain about
> anything else. If you think it's a simple fix, have at it. But
> I see at least three ways for _copypath_recurse to fail depending on
> exactly when the file disappears.

I have finally been able to take a couple of hours and looked at
problems in this module.

Errno is supported down to 5.8.8 per the docs, and those are set also
on Windows, copy() and opendir() complain with ENOENT for missing
entries:
http://perldoc.perl.org/5.8.8/Errno.html
readdir() does not though. If for example a directory gets removed in
the middle of a scan, then an empty list is returned. mkdir() would
not fail on ENOENT either, the parent directory would have already
been created if a child repository is being scanned.

With the check for -d and -f, this brings indeed the count to three
code paths. With the patch attached, I have added some manual sleep
calls in RecursiveCopy.pm and triggered manual deletions of the source
repository. The copy is able to complete in any case, warning about
missing directories or files. I have added warn messages in the patch
when ENOENT is triggered, though I think that those should be removed
in the final patch.

By the way, 010_logical_decoding_timelines.pl does not need to include
RecursiveCopy.
--
Michael

Attachment Content-Type Size
tap-copypath-enoent.patch application/octet-stream 3.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2017-09-11 01:24:19 Re: Setting pd_lower in GIN metapage
Previous Message Tomas Vondra 2017-09-11 00:59:07 Re: The case for removing replacement selection sort