Re: Bogus cleanup code in PostgresNode.pm

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Bogus cleanup code in PostgresNode.pm
Date: 2016-04-26 04:21:03
Message-ID: CAB7nPqSo-GpjEHqKi94U-sbsnPrJw4NkMfkS9NphXC+0JCapHQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Apr 25, 2016 at 11:51 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I noticed that even when they are successful, buildfarm members bowerbird
> and jacana tend to spew a lot of messages like this in their bin-check
> steps:
>
> Can't remove directory /home/pgrunner/bf/root/HEAD/pgsql.build/src/bin/scripts/tmp_check/data_main_DdUf/pgdata/global: Directory not empty at /usr/lib/perl5/5.8/File/Temp.pm line 898
> Can't remove directory /home/pgrunner/bf/root/HEAD/pgsql.build/src/bin/scripts/tmp_check/data_main_DdUf/pgdata/pg_xlog: Directory not empty at /usr/lib/perl5/5.8/File/Temp.pm line 898
> Can't remove directory /home/pgrunner/bf/root/HEAD/pgsql.build/src/bin/scripts/tmp_check/data_main_DdUf/pgdata: Permission denied at /usr/lib/perl5/5.8/File/Temp.pm line 898
> Can't remove directory /home/pgrunner/bf/root/HEAD/pgsql.build/src/bin/scripts/tmp_check/data_main_DdUf: Directory not empty at /usr/lib/perl5/5.8/File/Temp.pm line 898
> ### Signalling QUIT to 9156 for node "main"
> # Running: pg_ctl kill QUIT 9156
>
> What is happening here is that the test script is not bothering to do an
> explicit $node->stop operation, and if it doesn't, the automatic cleanup
> steps happen in the wrong order: the File::Temp destructor for the temp
> data directory runs before PostgresNode.pm's DESTROY function, which is
> what's issuing the "pg_ctl kill" command. On Unix that's just messy,
> but on Windows it fails because you can't delete a process's working
> directory. I am not sure whether this is guaranteed wrong or just
> sometimes wrong; the Perl docs I can find say that destructors are run in
> unspecified order once interpreter shutdown begins. But by adding some
> debug printout I was able to verify on my own machine that the data
> directory was already gone when DESTROY runs.

The docs say regarding File::Temp that he object is removed once the
object goes out of scope in the parent:
http://search.cpan.org/~dagolden/File-Temp-0.2304/lib/File/Temp.pm
So basically it means that when we enter in PostgresNode's DESTROY the
temporary folder just "went out of scope" and has been removed?

DESTROY is run once per object, END is a global destructor, and END is
called really at the end of the execution. And actually one reason why
a DESTROY block instead of END is given by Alvaro here:
http://www.postgresql.org/message-id/20151201231121.GI2763@alvherre.pgsql
"
- I changed start/stop/restart so that they keep track of the postmaster
PID; also added a DESTROY sub to PostgresNode that sends SIGQUIT.
This means that when the test finishes, the server gets an immediate
stop signal. We were getting a lot of errors in the server log about
failing to write to the stats file otherwise, until the node noticed
that the datadir was gone.
"

> I believe we can fix this by forcing postmaster shutdown in an END
> routine instead of a DESTROY routine, and hence propose the attached
> patch, which does things in the right order for me. I'm a pretty
> poor Perl programmer, so I'd appreciate somebody vetting this.

Another, perhaps more solid approach, would be put the DESTROY method
in charge of removing PGDATA and extend TestLib::tempdir with an
argument to be able to switch to CLEANUP => 0 at will. Then we use
this argument for PGDATA after sending SIGQUIT.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2016-04-26 04:27:50 Re: Support for N synchronous standby servers - take 2
Previous Message Kyotaro HORIGUCHI 2016-04-26 04:20:00 Re: Verifying embedded oids in *recv is a bad idea