Re: backup manifests and contemporaneous buildfarm failures

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, David Steele <david(at)pgmasters(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, Noah Misch <noah(at)leadboat(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Suraj Kharage <suraj(dot)kharage(at)enterprisedb(dot)com>, tushar <tushar(dot)ahuja(at)enterprisedb(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>, Tels <nospam-pg-abuse(at)bloodgate(dot)com>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>
Subject: Re: backup manifests and contemporaneous buildfarm failures
Date: 2020-04-04 00:12:15
Message-ID: 31057.1585959135@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Fri, Apr 3, 2020 at 6:48 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> I'm guessing that we're looking at a platform-specific difference in
>> whether "rm -rf" fails outright on an unreadable subdirectory, or
>> just tries to carry on by unlinking it anyway.

> My intention was that it would be cleaned by the TAP framework itself,
> since the temporary directories it creates are marked for cleanup. But
> it may be that there's a platform dependency in the behavior of Perl's
> File::Path::rmtree, too.

Yeah, so it would seem. The buildfarm script uses rmtree to clean out
the old build tree. The man page for File::Path suggests (but can't
quite bring itself to say in so many words) that by default, rmtree
will adjust the permissions on target directories to allow the deletion
to succeed. But that's very clearly not happening on some platforms.
(Maybe that represents a local patch on the part of some packagers
who thought it was too unsafe?)

Anyway, the end state presumably is that the pgsql.build directory
is still there at the end of the buildfarm run, and the next run's
attempt to also rmtree it fares no better. Then look what it does
to set up the new build:

system("cp -R -p $target $build_path 2>&1");

Of course, if $build_path already exists, then cp copies to a subdirectory
of the target not the target itself. So that explains the symptom
"./configure does not exist" --- it exists all right, but in a
subdirectory below the one where the buildfarm expects it to be.

It looks to me like the same problem would occur with VPATH or no.
The lack of failures among the VPATH-using critters probably has
more to do with whether their rmtree is willing to deal with this
case than with VPATH.

Anyway, it's evident that the buildfarm critters that are busted
will need manual cleanup, because the script is not going to be
able to get out of this by itself. I remain of the opinion that
the hazard of that happening again in the future (eg, if a buildfarm
animal loses power during the test) is sufficient reason to remove
this test case.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2020-04-04 00:14:01 Re: [HACKERS] make async slave to wait for lsn to be replayed
Previous Message Robert Haas 2020-04-03 23:55:34 Re: backup manifests and contemporaneous buildfarm failures