Quick Links

Re: Bogus cleanup code in PostgresNode.pm

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Bogus cleanup code in PostgresNode.pm
Date:	2016-04-26 04:21:03
Message-ID:	CAB7nPqSo-GpjEHqKi94U-sbsnPrJw4NkMfkS9NphXC+0JCapHQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Mon, Apr 25, 2016 at 11:51 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I noticed that even when they are successful, buildfarm members bowerbird
> and jacana tend to spew a lot of messages like this in their bin-check
> steps:
>
> Can't remove directory /home/pgrunner/bf/root/HEAD/pgsql.build/src/bin/scripts/tmp_check/data_main_DdUf/pgdata/global: Directory not empty at /usr/lib/perl5/5.8/File/Temp.pm line 898
> Can't remove directory /home/pgrunner/bf/root/HEAD/pgsql.build/src/bin/scripts/tmp_check/data_main_DdUf/pgdata/pg_xlog: Directory not empty at /usr/lib/perl5/5.8/File/Temp.pm line 898
> Can't remove directory /home/pgrunner/bf/root/HEAD/pgsql.build/src/bin/scripts/tmp_check/data_main_DdUf/pgdata: Permission denied at /usr/lib/perl5/5.8/File/Temp.pm line 898
> Can't remove directory /home/pgrunner/bf/root/HEAD/pgsql.build/src/bin/scripts/tmp_check/data_main_DdUf: Directory not empty at /usr/lib/perl5/5.8/File/Temp.pm line 898
> ### Signalling QUIT to 9156 for node "main"
> # Running: pg_ctl kill QUIT 9156
>
> What is happening here is that the test script is not bothering to do an
> explicit $node->stop operation, and if it doesn't, the automatic cleanup
> steps happen in the wrong order: the File::Temp destructor for the temp
> data directory runs before PostgresNode.pm's DESTROY function, which is
> what's issuing the "pg_ctl kill" command. On Unix that's just messy,
> but on Windows it fails because you can't delete a process's working
> directory. I am not sure whether this is guaranteed wrong or just
> sometimes wrong; the Perl docs I can find say that destructors are run in
> unspecified order once interpreter shutdown begins. But by adding some
> debug printout I was able to verify on my own machine that the data
> directory was already gone when DESTROY runs.

The docs say regarding File::Temp that he object is removed once the
object goes out of scope in the parent:
http://search.cpan.org/~dagolden/File-Temp-0.2304/lib/File/Temp.pm
So basically it means that when we enter in PostgresNode's DESTROY the
temporary folder just "went out of scope" and has been removed?

DESTROY is run once per object, END is a global destructor, and END is
called really at the end of the execution. And actually one reason why
a DESTROY block instead of END is given by Alvaro here:
http://www.postgresql.org/message-id/20151201231121.GI2763@alvherre.pgsql
"
- I changed start/stop/restart so that they keep track of the postmaster
PID; also added a DESTROY sub to PostgresNode that sends SIGQUIT.
This means that when the test finishes, the server gets an immediate
stop signal. We were getting a lot of errors in the server log about
failing to write to the stats file otherwise, until the node noticed
that the datadir was gone.
"

> I believe we can fix this by forcing postmaster shutdown in an END
> routine instead of a DESTROY routine, and hence propose the attached
> patch, which does things in the right order for me. I'm a pretty
> poor Perl programmer, so I'd appreciate somebody vetting this.

Another, perhaps more solid approach, would be put the DESTROY method
in charge of removing PGDATA and extend TestLib::tempdir with an
argument to be able to switch to CLEANUP => 0 at will. Then we use
this argument for PGDATA after sending SIGQUIT.
--
Michael

In response to

Bogus cleanup code in PostgresNode.pm at 2016-04-25 14:51:04 from Tom Lane

Responses

Re: Bogus cleanup code in PostgresNode.pm at 2016-04-26 05:24:31 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Amit Kapila	2016-04-26 04:27:50	Re: Support for N synchronous standby servers - take 2
Previous Message	Kyotaro HORIGUCHI	2016-04-26 04:20:00	Re: Verifying embedded oids in *recv is a bad idea