Re: BUG #15460: Error while creating index or constraint

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: paul(dot)vanderlinden(at)mapcreator(dot)eu
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Subject: Re: BUG #15460: Error while creating index or constraint
Date: 2018-11-03 09:13:44
Message-ID: CAEepm=2rH_V5by1kH1Q1HZWPFj=4ykjU4JcyoKMNVT6Jh8Q_Rw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Fri, Nov 2, 2018 at 9:02 PM Paul van der Linden
<paul(dot)vanderlinden(at)mapcreator(dot)eu> wrote:
> Well, I can test.
> If you'll provide me with the call (incl flags) that is done on that moment to remove the directory I can see what is needed to make that fail and possibly how to circumvent that

Thanks. Actually I think I now see what's going on with the
"Directory not empty" error. I experimented a bit on AppVeyor's
wonderful free Windows build bot, and finished up with a simple
program[1] that spits out:

Creating directory my_dir...
Creating file my_dir\foo.txt...
Unlinking file my_dir\foo.txt...
Opening directory my_dir...
Removing directory my_dir with rmdir()...
Failed, error = 145, errno = 41 (Directory not empty).

Although you can unlink files and directories for which there are open
handles floating around as long as all handles were opened with
FILE_SHARE_DELETE, this shows that you can't unlink directories if
they contain[ed] files that have now been unlinked, if someone still
has a handle open. In other words, the file isn't really unlinked at
all, it's just hidden until all referencing handles are closed, or
something like that.

Back to PostgreSQL, in the non-error happy path there is no problem,
because well behaved clients close handles before detaching from the
reference-counted SharedFileSet, so when the last detacher unlinks
there can be no handles left. In the error path, however, we rely on
resowner.c to close handles, and it does that *after* releasing the
DSM segment that triggers unlinking.

We discussed this exact topic while working on this stuff[2] and I
concluded incorrectly that you get sufficiently Unix-like behaviour if
all openers use FILE_SHARE_DELETE. I was apparently right about
unlinking the files themselves, but not for unlinking directories that
hold them. So, it looks like we may need to reorder the cleanup code
in resowner.c, after all. I already had a patch for that, but I'm
certainly not comfortable making such a change for the minor release
that's about to wrap, since it could have unforeseen consequences and
needs more study.

Of course none of this explains the root problem that led to the error
('could not determine size of temporary file "0"'), it's just a noisy,
temporary directory leaking error path that we should fix.

[1] https://github.com/macdice/hello-windows/blob/remove-directory/test.c
[2] https://www.postgresql.org/message-id/CAEepm=1Ugp7mNFX-YpSfWr0F_6QA4jzjtavauBcoAAZGd7_tPA@mail.gmail.com

--
Thomas Munro
http://www.enterprisedb.com

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Daniel Verite 2018-11-03 11:11:53 Re: Unable to copy large (>2GB) files using PostgreSQL 11 (Windows)
Previous Message Andres Freund 2018-11-03 05:57:00 Re: Wrong aggregate result when sorting by a NULL value