Re: fsync-pgdata-on-recovery tries to write to more files than previously

From: Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Christoph Berg <myon(at)debian(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: fsync-pgdata-on-recovery tries to write to more files than previously
Date: 2015-05-28 14:26:15
Message-ID: 20150528142614.GA21390@toroid.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi.

Here's an updated patch for the fsync problem(s).

A few points that may be worth considering:

1. I've made the ReadDir → ReadDirExtended change, but in retrospect I
don't think it's really worth it. Using ReadDir and letting it die
is probably fine. (Aside: I left ReadDirExtended static for now.)

2. Robert, are you comfortable with what fsync_pgdata() does in xlog.c?
I wasn't sure if I should move that to fd.c as well. I think it's
borderline OK for now.

3. There's a comment in fsync_fname that we need to open directories
with O_RDONLY and non-directories with O_RDWR. Does this also apply
to pre_sync_fname (i.e. sync_file_range and posix_fadvise) on any
platforms we care about? It doesn't seem so, but I'm not sure.

4. As a partial aside, why does fsync_fname use OpenTransientFile? It
looks like it should use BasicOpenFile as pre_sync_fname does. We
close the fd immediately after calling fsync anyway. Or is there
some reason I'm missing that we should use OpenTransientFile in
pre_sync_fname too?

(Aside²: it's pointless to specify S_IRUSR|S_IWUSR in fsync_fname
since we're not setting O_CREAT. Minor, so will submit separate
patch.)

5. I made walktblspc_entries use stat() instead of lstat(), so that we
can handle symlinks and directories the same way. Note that we won't
continue to follow links inside the sub-directories because we use
walkdir instead of recursing.

But this depends on S_ISDIR() returning true after stat() on Windows
with a junction. Does that work? And are the entries in pb_tblspc on
Windows actually junctions?

I've tested this with unreadable files in PGDATA and junk inside
pb_tblspc. Feedback welcome.

I have a separate patch to initdb with the corresponding changes, which
I will post after dinner and a bit more testing.

-- Abhijit

Attachment Content-Type Size
0001-20150528-Skip-unreadable-files-in-fsync_pgdata-follow-only-ex.patch text/x-diff 11.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2015-05-28 14:39:15 Re: pg_upgrade resets timeline to 1
Previous Message Bruce Momjian 2015-05-28 14:21:17 Re: pg_upgrade resets timeline to 1