Re: Two patches to speed up pg_rewind.

From: Paul Guo <paulguo(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Paul Guo <guopa(at)vmware(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Two patches to speed up pg_rewind.
Date: 2021-08-20 03:33:33
Message-ID: CABQrize7pf4hbKpeCMT_gsh0Q9QROqvseoS7GqVfyhNqVis4nQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks for reviewing, please see the replies below.

On Tue, Aug 17, 2021 at 3:47 PM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> On Thu, Aug 05, 2021 at 06:18:03PM +0800, Paul Guo wrote:
> > I modified the copy_file_range() patch using the below logic:
> >
> > If the first call of copy_file_range() fails with errno EXDEV or
> > ENOTSUP, pg_rewind
> > would not use copy_file_range() in rest code, and if copy_file_range() fails we
> > fallback to use the previous read()+write() code logic for the file.
>
> I have looked at 0001, and I don't really like it. One argument
> against this approach is that if pg_rewind fails in the middle of its
> operation then we would have done a set of fsync() for nothing, with
> the data folder still unusable. I would be curious to see some
> numbers to see how much it matters with many physical files (say cases
> with thousands of small relations?).
> +/* Define PG_FLUSH_DATA_WORKS if we have an implementation for pg_flush_data */
> +#if defined(HAVE_SYNC_FILE_RANGE)
> +#define PG_FLUSH_DATA_WORKS 1
> +#elif !defined(WIN32) && defined(MS_ASYNC)
> +#define PG_FLUSH_DATA_WORKS 1
> +#elif defined(USE_POSIX_FADVISE) && defined(POSIX_FADV_DONTNEED)
> +#define PG_FLUSH_DATA_WORKS 1
>
> This is wrong for the code frontend on platforms that may finish by
> using MS_ASYNC, no? There is no such implementation in file_utils.c
> but there is one in fd.c.

Yes, it seems that we need to add the MS_ASYNC code (refer that in fd.c) in
src/common/file_utils.c:pre_sync_fname().

> + fsync_fname("global/pg_control", false);
> + fsync_fname("backup_label", false);
> + if (access("recovery.conf", F_OK) == 0)
> + fsync_fname("recovery.conf", false);
> + if (access("postgresql.auto.conf", F_OK) == 0)
> + fsync_fname("postgresql.auto.conf", false);
>
> This list is wrong on various aspects, no? This would miss custom
> configuration files, or included files.

I did not understand this. Can you please clarify? Anyway let me
explain, here we fsync
these files additionally because pg_rewind (possibly) modified these
files after rewinding.
These files may not be handled/logged in filemap

pg_control action is FILE_ACTION_NONE
backup_label is excluded
recovery.conf is not logged in filemap
postgresql.auto.conf may be logged but let's fsync this file for safety.

>
> - if (showprogress)
> - pg_log_info("syncing target data directory");
> - sync_target_dir();
> -
> /* Also update the standby configuration, if requested. */
> if (writerecoveryconf && !dry_run)
> WriteRecoveryConfig(conn, datadir_target,
> GenerateRecoveryConfig(conn, NULL));
>
> + if (showprogress)
> + pg_log_info("syncing target data directory");
> + perform_sync(filemap);
>
> Why inverting the order here?

We need to synchronize the recoveryconf change finally in perform_sync().

>
> + * Pre Linux 5.3 does not allow cross-fs copy_file_range() call
> + * (return EXDEV). Some fs do not support copy_file_range() (return
> + * ENOTSUP). Here we explicitly disable copy_file_range() for the
> + * two scenarios. For other failures we still allow subsequent
> + * copy_file_range() try.
> + */
> + if (errno == ENOTSUP || errno == EXDEV)
> + copy_file_range_support = false;
> Are you sure that it is right to always cancel the support of
> copy_file_range() after it does not work once? Couldn't it be
> possible that things work properly depending on the tablespace being
> worked on by pg_rewind?

Ideally we should retry when first running into a symlink (e.g.
tablespace, wal),
but it seems not easy to do gracefully.

> Having the facility for copy_file_range() in pg_rewind is not nice at
> the end, and we are going to need a run-time check to fallback
> dynamically to an equivalent implementation on errno={EXDEV,ENOTSUP}.
> Hmm. What about locating all that in file_utils.c instead, with a
> brand new routine name (pg_copy_file_range would be one choice)? We
> still need the ./configure check, except that the conditions to use
> the fallback implementation is in this routine, aka fallback on EXDEV,
> ENOTSUP or !HAVE_COPY_FILE_RANGE. The backend may need to solve this
> problem at some point, but logging and fd handling will likely just
> locate that in fd.c, so having one routine for the purpose of all
> frontends looks like a step in the right direction.

Yes, seems better to make it generic.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey Borodin 2021-08-20 05:04:42 Re: Subtransactions + a long-running transaction leading to performance degradation on standbys
Previous Message Tom Lane 2021-08-20 03:25:00 Re: Nitpick/question: Use of aliases for global variables in functions