Race in "tablespace" test on Windows

From: Noah Misch <noah(at)leadboat(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Race in "tablespace" test on Windows
Date: 2014-11-08 05:04:23
Message-ID: 20141108050423.GA642055@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

In my Windows development environment, the tablespace regression test fails
approximately half the time. Buildfarm member frogmouth failed in the same
manner at least once:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=frogmouth&dt=2014-05-21%2014%3A30%3A01

Here is a briefer command sequence exhibiting the same problem:

CREATE TABLESPACE testspace LOCATION '...somewhere...';
CREATE TABLE atable (c int) tablespace testspace;
SELECT COUNT(*) FROM atable; -- open heap
\c -
ALTER TABLE atable SET TABLESPACE pg_default;
DROP TABLESPACE testspace; -- bug: fails sometimes
DROP TABLESPACE testspace; -- second one ~always works
DROP TABLE atable;

When we unlink an open file, Windows retains it in the directory structure
until all processes close it. ALTER TABLE SET TABLESPACE sends invalidation
messages prompting backends to do so. The backend running the ALTER TABLE
always processes invalidations before processing another command. The other
backend, the one serving commands before "\c -", may have neither exited nor
processed the invalidation. When it yet holds a file descriptor for "atable",
the DROP TABLESPACE fails. I suspect it's possible, though more difficult, to
see like trouble in dbcommands.c users of RequestCheckpoint(CHECKPOINT_WAIT).

To make this work as well on Windows as it does elsewhere, DROP TABLESPACE
would need to wait for other backends to close relevant unlinked files.
Perhaps implement "wait_unlinked_files(const char *dirname)" to poll unlinked,
open files until they disappear. (An attempt to open an unlinked file reports
ERROR_ACCESS_DENIED. It might be tricky to reliably distinguish this cause
from other causes of that error, but it should be possible.) I propose to add
this as a TODO, then bandage the test case with s/^\\c -$/RESET ROLE;/. That
reduces the number of relevant backends to one, making the race irrelevant.

Thanks,
nm

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2014-11-08 05:37:03 Re: split builtins.h to quote.h
Previous Message Michael Paquier 2014-11-08 05:01:54 Re: remove pg_standby?