BUG #5166: readdir() failure on Mac OS-X is HFS "feature"

From: "Stephen Tyler" <stephen(at)stephen-tyler(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #5166: readdir() failure on Mac OS-X is HFS "feature"
Date: 2009-11-05 02:44:13
Message-ID: 200911050244.nA52iD8e096216@wwwmaster.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs


The following bug has been logged online:

Bug reference: 5166
Logged by: Stephen Tyler
Email address: stephen(at)stephen-tyler(dot)com
PostgreSQL version: 8.4.1
Operating system: Snow Leopard OS-X 10.6.1 (64 bit)
Description: readdir() failure on Mac OS-X is HFS "feature"
Details:

I'm frequently getting these errors in my console:

4/11/09 2:25:04 PM org.postgresql.postgres[192] ERROR: could not read
directory "pg_xlog": Invalid argument
4/11/09 2:25:56 PM org.postgresql.postgres[192] ERROR: could not read
directory "pg_xlog": Invalid argument
4/11/09 2:36:03 PM org.postgresql.postgres[192] ERROR: could not read
directory "pg_xlog": Invalid argument

and rarely:

3/11/09 10:32:31 PM org.postgresql.postgres[217] ERROR: could not
read directory "pg_clog": Invalid argument

The pg_xlog errors occur periodically, or can be induced by any large change
(eg deleting 1 million rows)

System:
Mac Pro Quad Nahelem 2.93GHz, 16GB RAM running Snow Leopard OS X 10.6.1 in
64bit mode
Postgres 8.4.1 (Intel 64 bit) from
http://www.kyngchaos.com/software:postgres running in 64 bit mode
Database is on an SSD Raid 0 array

The error appears to be generated in src/port/dirmod.c:

pgfnames(const char *path)
{
....
while ((file = readdir(dir)) != NULL)
{
....
errno = 0;
}
....
if (errno)
{
....
fprintf(stderr, _("could not read directory \"%s\": %s\n"),
path, strerror(errno));
....
}

I previously posted this to pgsql-general "Could not read directory
\"pg_xlog\": Invalid argument (on SSD Raid)" on Nov 4 2009.

Tom Lane replied:
This is a known bug in Snow Leopard --- readdir() calls fail after having
deleted a file in the directory. We are hoping that Apple fixes it in
10.6.2, because trying to kluge around it seems like a mess.

I am submitting it as a bug because:

1) It occurs in a different place than previously reported readdir() related
errors

2) It appears to be a "feature" of HFS on OS-X, rather than a "bug":

http://support.apple.com/kb/TA21420?viewlocale=en_US

<quote>
Some implementations of the "rm" command line tool (and other tools that
support the recursive removal of files and directories) depend on an
unspecified behavior outlined below.

opendir();
while (readdir()) unlink();
closedir();

The unspecified behavior is what readdir() should return after the directory
has been modified. Many file systems have been implemented such that
subsequent readdir() calls will return the next directory entry. The
implementation of the HFS file system cannot guarantee that all enclosed
files or directories will be removed using the above method.

The Mac OS X "rm" command does not exhibit this behavior because it uses the
fts(3) library functions to traverse the directory hierarchy, which causes
the entire contents of the directory to be read before modifying it.

Solution

The readdir()/unlink() loop needs to call rewinddir() either after each
unlink() call, or whenever readdir returns NULL immediately after files have
been unlinked. For example:

"The following pseudocode describes this solution:"

opendir();
do {
unlinkedfiles = 0;
while(readdir()) {
unlink();
unlinkedfiles = 1;
}
if (unlinkedfiles)
rewinddir();
} while (unlinkedfiles);
closedir();

</quote>

My reading of the above support article from Apple leads me to believe that
Apple has no plans in the near future to alter/fix the behaviour of
readdir() in the OS.

The previously suggested hack/patch to fix DROP TABLESPACE is insufficient:

pgsql/src/backend/commands:
tablespace.c (r1.61 -> r1.62)

(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/commands/tablesp
ace.c?r1=1.61&r2=1.62)

I have found readdir() calls in:
src/port/dirent.c
src/port/dirmod.c
src/bin/initdb/initdb.c
src/bin/pg_resetxlog/pg_resetxlog.c
src/bin/psql/create_help.pl
src/tools/msvc/Install.pm
src/tools/msvc/Mkvcbuild.pm
src/backend/storage/file/fd.c
contrib/pg_standby/pg_standby.c

I don't know which of the above depend on the "unspecified" readdir
behaviour (beyond dirmod.c), and are thus prone to failure on Mac OS X.

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Fujii Masao 2009-11-05 04:29:40 Re: BUG #4961: pg_standby.exe crashes with no args
Previous Message Craig Ringer 2009-11-05 01:31:42 Re: BUG #5163: Admin can't connect and won't use port 5432