Re: Concurrency issue in pg_rewind

From: Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>
To: Alexander Kukushkin <cyberdemn(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Concurrency issue in pg_rewind
Date: 2020-09-17 13:05:28
Message-ID: 30ec75b9bd9bfab1e83e7168dc6d6ddc@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2020-09-17 15:27, Alexander Kukushkin wrote:
> On Thu, 17 Sep 2020 at 14:04, Alexey Kondratov
> <a(dot)kondratov(at)postgrespro(dot)ru> wrote:
>
>> With --restore-target-wal pg_rewind is trying to call restore_command
>> on
>> its own and it can happen at two stages:
>>
>> 1) When pg_rewind is trying to find the last checkpoint preceding a
>> divergence point. In that case file map is not even yet initialized.
>> Thus, all fetched WAL segments at this stage will be present in the
>> file
>> map created later.
>
> Nope, it will fetch files you requested, and in addition to that it
> will leave a child process running in the background which is doing
> the prefetch (manipulating with pg_wal/.wal-g/...)
>
>>
>> 2) When it creates a data pages map. It should traverse WAL from the
>> last common checkpoint till the final shutdown point in order to find
>> all modified pages on the target. At this stage pg_rewind only updates
>> info about data segments in the file map. That way, I see a minor
>> problem that WAL segments fetched at this stage would not be deleted,
>> since they are absent in the file map.
>>
>> Anyway, pg_rewind does not delete neither WAL segments, not any other
>> files in the middle of the file map creation, so I cannot imagine, how
>> it can get into the same trouble on its own.
>
> When pg_rewind was creating the map, some temporary files where there,
> because the forked child process of wal-g was still running.
> When the wal-g child process exits, it removes some of these files.
> Specifically, it was trying to prefetch 0000008400000A7600000024 into
> the pg_wal/.wal-g/prefetch/running/0000008400000A7600000024, but
> apparently the file wasn't available on S3 and prefetch failed,
> therefore the empty file was removed.
>

I do understand how you got into this problem with wal-g. This part of
my answer was about bare postgres and pg_rewind. And my point was that
from my perspective pg_rewind with --restore-target-wal cannot get into
the same trouble on its own, without 'help' of some side tools like
wal-g.

Regards
--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message k.jamison@fujitsu.com 2020-09-17 13:06:33 RE: [Patch] Optimize dropping of relation buffers using dlist
Previous Message Amit Kapila 2020-09-17 12:51:59 Re: Fix for parallel BTree initialization bug