Re: Race in "tablespace" test on Windows

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Race in "tablespace" test on Windows
Date: 2014-11-11 04:51:26
Message-ID: CAA4eK1KgRBN6oEfVQXchQ+8j45Qq801HnNsw6TApeRn8aiXbxQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Nov 8, 2014 at 10:34 AM, Noah Misch <noah(at)leadboat(dot)com> wrote:
>
> In my Windows development environment, the tablespace regression test
fails
> approximately half the time. Buildfarm member frogmouth failed in the
same
> manner at least once:
>
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=frogmouth&dt=2014-05-21%2014%3A30%3A01
>
> Here is a briefer command sequence exhibiting the same problem:
>
> CREATE TABLESPACE testspace LOCATION '...somewhere...';
> CREATE TABLE atable (c int) tablespace testspace;
> SELECT COUNT(*) FROM atable; -- open heap
> \c -
> ALTER TABLE atable SET TABLESPACE pg_default;
> DROP TABLESPACE testspace; -- bug: fails sometimes
> DROP TABLESPACE testspace; -- second one ~always works
> DROP TABLE atable;
>

For me, it doesn't get success even second time, I am getting
the same error until I execute some command on first session
which means till first session has processed the invalidation
messages.

postgres=# Drop tablespace tbs;
ERROR: tablespace "tbs" is not empty
postgres=# Drop tablespace tbs;
ERROR: tablespace "tbs" is not empty

I have tested this on Windows 7.

> When we unlink an open file, Windows retains it in the directory structure
> until all processes close it. ALTER TABLE SET TABLESPACE sends
invalidation
> messages prompting backends to do so. The backend running the ALTER TABLE
> always processes invalidations before processing another command. The
other
> backend, the one serving commands before "\c -", may have neither exited
nor
> processed the invalidation. When it yet holds a file descriptor for
"atable",
> the DROP TABLESPACE fails. I suspect it's possible, though more
difficult, to
> see like trouble in dbcommands.c users of
RequestCheckpoint(CHECKPOINT_WAIT).
>
> To make this work as well on Windows as it does elsewhere, DROP TABLESPACE
> would need to wait for other backends to close relevant unlinked files.
> Perhaps implement "wait_unlinked_files(const char *dirname)" to poll
unlinked,
> open files until they disappear. (An attempt to open an unlinked file
reports
> ERROR_ACCESS_DENIED. It might be tricky to reliably distinguish this
cause
> from other causes of that error, but it should be possible.)

I think the proposed mechanism can work but the wait can be very long
(untill the backend holding descriptor executes another command).
Can we think of some other solution like in Drop Tablespace instead of
checking if directory is empty, check if there is no object that belongs
to database/cluster, then allow to forcibly delete that directory someway.

> I propose to add
> this as a TODO, then bandage the test case with s/^\\c -$/RESET ROLE;/.

Yeah, this make sense.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2014-11-11 04:52:22 Re: REINDEX CONCURRENTLY 2.0
Previous Message Michael Paquier 2014-11-11 04:13:12 Re: Doing better at HINTing an appropriate column within errorMissingColumn()