Re: pg_rewind failure by file deletion in source server

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_rewind failure by file deletion in source server
Date: 2015-06-23 04:51:20
Message-ID: CAB7nPqT=tgOgr3-9pxRy-ABfYYms0rTan5R6f2nEV4rjOz9jdg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jun 19, 2015 at 9:22 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, Jun 19, 2015 at 8:18 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Fri, Jun 19, 2015 at 12:14 AM, Michael Paquier
>> <michael(dot)paquier(at)gmail(dot)com> wrote:
>>>> Listing the directories with pg_ls_dir() has the same problem.
>>>
>>> (After some discussion on IM with Heikki on this one).
>>> This is actually more tricky because pg_ls_dir() does not return '.'
>>> or '..' that we could use to identify that the directory actually
>>> exists or not when it is empty. Hence I think that we should add two
>>> options to pg_ls_dir:
>>> - include_self, default to false. If set to true, '.' is added in the
>>> list of items.
>>> - if_not_exists, to bypass error that a folder does not exist, default
>>> at false. If if_not_exists = true and include_self = true, returning
>>> only '.' would mean that the folder exist but that it is empty. If
>>> if_not_exists = true and include_self = false, no rows are returned.
>>> We could as well ERROR as well if both options are set like that. I am
>>> fine with any of them as long as behavior is properly documented.
>>
>> Including '.' to distinguish between an empty directory and a
>> nonexistent one seems like an unnecessarily complicated and
>> non-obvious API. How about just one additional parameter bool
>> *exists. If NULL and no directory, ERROR, else on return set *exists
>> to true or false.
>
> Err, wait. You're talking about an SQL function, heh heh. So that
> won't work. Maybe what you proposed is the best we can do, then,
> although I would suggest that you rename the include_self parameter to
> something like include_dot_dirs and return both "." and "..".
> Returning only "." seems like it will seem weird to people.

So... Attached are a set of patches dedicated at fixing this issue:
- 0001, add if_not_exists to pg_tablespace_location, returning NULL if
path does not exist
- 0002, same with pg_stat_file, returning NULL if file does not exist
- 0003, same with pg_read_*file. I added them to all the existing
functions for consistency.
- 0004, pg_ls_dir extended with if_not_exists and include_dot_dirs
(thanks Robert for the naming!)
- 0005, as things get complex, a set of regression tests aimed to
covering those things. pg_tablespace_location is platform-dependent,
so there are no tests for it.
- 0006, the fix for pg_rewind, using what has been implemented before.

Attached is an updated test case triggering the issue
(rewind_test.bash), with the small patch attached that adds a pg_sleep
call in pg_rewind.c (20150623_pg_rewind_sleep.patch).

I imagine that this is a bug people are going to meet in the field
easily, particularly with temporary relation files or temporary XLOG
files.
Regards,
--
Michael

Attachment Content-Type Size
0001-Extend-pg_tablespace_location-with-if_not_exists-opt.patch application/x-patch 5.6 KB
0002-Extend-pg_stat_file-with-if_not_exists-option.patch application/x-patch 6.4 KB
0003-Add-IF-NOT-EXISTS-to-pg_read_file-and-pg_read_binary.patch application/x-patch 14.0 KB
0004-Extend-pg_ls_dir-with-include_dot_dirs-and-if_not_ex.patch application/x-patch 6.1 KB
0005-Add-regression-tests-for-pg_ls_dir-and-pg_read_-bina.patch application/x-patch 5.5 KB
0006-Make-pg_rewind-able-to-detect-deleted-files-on-remot.patch application/x-patch 4.1 KB
20150623_pg_rewind_sleep.patch application/x-patch 457 bytes
rewind_test.bash application/octet-stream 1.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2015-06-23 04:59:42 Re: checkpointer continuous flushing
Previous Message Michael Paquier 2015-06-23 04:39:55 A couple of newlines missing in pg_rewind log entries