Re: Concurrency issue in pg_rewind

From: Alexander Kukushkin <cyberdemn(at)gmail(dot)com>
To: Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Concurrency issue in pg_rewind
Date: 2020-09-17 12:27:13
Message-ID: CAFh8B=mmZ5S-Y5Lf9v=oYfWMG+OxJpXHtO8yjXOaAoJUFXmX3g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 17 Sep 2020 at 14:04, Alexey Kondratov
<a(dot)kondratov(at)postgrespro(dot)ru> wrote:
>
> Hm, I cannot understand why wal-g (or any other tool) is trying to run
> pg_rewind, while WAL copying (and prefetch) is still in progress? Why do
> not just wait until it is finished?

wal-g doesn't try to call pg_rewind.
First, we called wal-g, it fetched the file we requested and exited.
But, before exiting, wal-g forks, and the child process does prefetch
of a few next WALs.
We don't really know when the child process exits and can't wait for it.

>
> It is also not clear for me why it does not put prefetched WAL files
> directly into the pg_wal?

Because this is how postgres works. It doesn't matter whether the
specific WAL segment is there, postgres will call the restore_command
anyway.
The restore command also doesn't know if the file in pg_wal is OK,
therefore keeping the prefetched file in some other place and moving
it seems to be a good approach.

> With --restore-target-wal pg_rewind is trying to call restore_command on
> its own and it can happen at two stages:
>
> 1) When pg_rewind is trying to find the last checkpoint preceding a
> divergence point. In that case file map is not even yet initialized.
> Thus, all fetched WAL segments at this stage will be present in the file
> map created later.

Nope, it will fetch files you requested, and in addition to that it
will leave a child process running in the background which is doing
the prefetch (manipulating with pg_wal/.wal-g/...)

>
> 2) When it creates a data pages map. It should traverse WAL from the
> last common checkpoint till the final shutdown point in order to find
> all modified pages on the target. At this stage pg_rewind only updates
> info about data segments in the file map. That way, I see a minor
> problem that WAL segments fetched at this stage would not be deleted,
> since they are absent in the file map.
>
> Anyway, pg_rewind does not delete neither WAL segments, not any other
> files in the middle of the file map creation, so I cannot imagine, how
> it can get into the same trouble on its own.

When pg_rewind was creating the map, some temporary files where there,
because the forked child process of wal-g was still running.
When the wal-g child process exits, it removes some of these files.
Specifically, it was trying to prefetch 0000008400000A7600000024 into
the pg_wal/.wal-g/prefetch/running/0000008400000A7600000024, but
apparently the file wasn't available on S3 and prefetch failed,
therefore the empty file was removed.

> Although keeping arbitrary files inside PGDATA does not look like a good
> idea for me, I do not see anything criminal in skipping non-existing
> file, when executing a file map by pg_rewind.

Good, I will prepare a patch then.

Regards,
--
Alexander Kukushkin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2020-09-17 12:34:52 Re: [HACKERS] logical decoding of two-phase transactions
Previous Message Amit Kapila 2020-09-17 12:20:27 Re: logical/relation.c header description