Re: pg_rewind failure by file deletion in source server

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_rewind failure by file deletion in source server
Date: 2015-06-24 06:43:35
Message-ID: CAB7nPqR-n4REfqHX=BXaBSbLgL+NvTyuOAnpwGtg26oCP75LxQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jun 23, 2015 at 9:19 PM, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> On 06/23/2015 07:51 AM, Michael Paquier wrote:
>> - 0001, add if_not_exists to pg_tablespace_location, returning NULL if
>> path does not exist
>> - 0002, same with pg_stat_file, returning NULL if file does not exist
>> - 0003, same with pg_read_*file. I added them to all the existing
>> functions for consistency.
>> - 0004, pg_ls_dir extended with if_not_exists and include_dot_dirs
>> (thanks Robert for the naming!)
>> - 0005, as things get complex, a set of regression tests aimed to
>> covering those things. pg_tablespace_location is platform-dependent,
>> so there are no tests for it.
>> - 0006, the fix for pg_rewind, using what has been implemented before.
>
>
> With these patches, pg_read_file() will return NULL for any failure to open
> the file, which makes pg_rewind to assume that the file doesn't exist in the
> source server, and will remove the file from the destination. That's
> dangerous, those functions should check specifically for ENOENT.

This makes sense. I changed all those functions to do so.

> There's still a small race condition with tablespaces. If you run CREATE
> TABLESPACE in the source server while pg_rewind is running, it's possible
> that the recursive query that pg_rewind uses sees the symlink in pg_tblspc/
> directory, but its snapshot doesn't see the row in pg_tablespace yet. It
> will think that the symlink is a regular file, try to read it, and fail (if
> we checked for ENOENT).
> Actually, I think we need try to deal with symlinks a bit harder. Currently,
> pg_rewind assumes that anything in pg_tblspace that has a matching row in
> pg_tablespace is a symlink, and nothing else is. I think symlinks to
> directories. I just noticed that pg_rewind fails miserable if pg_xlog is a
> symlink, because of that:
>
> ----
> The servers diverged at WAL position 0/3023F08 on timeline 1.
> Rewinding from last common checkpoint at 0/2000060 on timeline 1
>
> "data-master//pg_xlog" is not a directory
> Failure, exiting
> ----

It may be possible that in this case the path is a symlink on the
source but not on the target, and vice-versa, so this looks too
restrictive to me if we begin to use pg_readlink. Think for example of
a symlink of pg_xlog that is not included in a base backup, we ignore
it in copy_fetch.c for pg_xlog and the others, I think that here as
well we should ignore those errors except for tablespaces.

> I think we need to add a column to pg_stat_file output, to indicate symbolic
> links, and add a pg_readlink() function. That still leaves a race condition
> if the type of a file changes, i.e. a file is deleted and a directory with
> the same name is created in its place, but that seems acceptable. I don't
> think PostgreSQL ever does such a thing, so that could only happen if you
> mess with the data directory manually while the server is running.

Hm. pg_stat_file uses now stat(), and not lstat() so it cannot make
the difference between what is a link or not. I have changed
pg_stat_file to use lstat instead to cover this case in my set of
patches, but a new function may be a better answer here. I have added
as well a pg_readlink() function on the stack, and actually the
if_not_exists mode of pg_tablespace_location is not needed anymore if
we rely on pg_readlink in this case. I have let it in the set of
patches though. This still looks useful for other utility tools.

> I just realized another problem: We recently learned the hard way that some
> people have files in the data directory that are not writeable by the
> 'postgres' user
> (http://www.postgresql.org/message-id/20150523172627.GA24277@msg.df7cb.de).
> pg_rewind will try to overwrite all files it doesn't recognize as relation
> files, so it's going to fail on those. A straightforward fix would be to
> first open the destination file in read-only mode, and compare its contents,
> and only open the file in write mode if it has changed. It would still fail
> when the files really differ, but I think that's acceptable.

If I am missing nothing, two code paths need to be patched here:
copy_file_range and receiveFileChunks. copy_file_range is
straight-forward. Now wouldn't it be better to write the contents into
a temporary file, compare their content, and then switch if necessary
for receiveFileChunks?

> I note that pg_rewind doesn't need to distinguish between an empty and a
> non-existent directory, so it's quite silly for it to pass
> include_dot_dirs=true, and then filter out "." and ".." from the result set.
> The documentation should mention the main reason for including "." and "..":
> to distinguish between an empty and non-existent directory.

OK. Switched to that in the first patch for pg_rewind.

Attached is a new set of patches. Except for the last ones that
addresses one issue of pg_rewind (symlink management when streaming
PGDATA), all the others introduce if_not_exists options for the
functions of genfile.c. The pg_rewind stuff could be more polished
though. Feel free to comment.
Regards,
--
Michael

Attachment Content-Type Size
0001-Extend-pg_tablespace_location-with-if_not_exists-opt.patch application/x-patch 5.4 KB
0002-Extend-pg_stat_file-with-if_not_exists-option.patch application/x-patch 6.5 KB
0003-Add-IF-NOT-EXISTS-to-pg_read_file-and-pg_read_binary.patch application/x-patch 14.0 KB
0004-Extend-pg_ls_dir-with-include_dot_dirs-and-if_not_ex.patch application/x-patch 6.5 KB
0005-Add-new-column-islink-in-pg_stat_file.patch application/x-patch 5.0 KB
0006-Add-pg_readlink-to-get-value-of-a-symbolic-link.patch application/x-patch 4.6 KB
0007-Add-regression-tests-for-pg_ls_dir-and-pg_read_-bina.patch application/x-patch 5.5 KB
0008-Fix-symlink-usage-in-pg_rewind.patch application/x-patch 7.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2015-06-24 07:06:19 Re: Hash index creation warning
Previous Message Fabien COELHO 2015-06-24 06:29:04 Re: checkpointer continuous flushing