Re: Duplicate history file?

From: Tatsuro Yamada <tatsuro(dot)yamada(dot)tf(at)nttcom(dot)co(dot)jp>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Duplicate history file?
Date: 2021-06-08 09:19:04
Message-ID: 4d9aa52e-00dd-a68d-da45-50ab863af6b6@nttcom.co.jp_1
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Horiguchi-san,

> This thread should have been started here:
>
> https://www.postgresql.org/message-id/20210531.165825.921389284096975508.horikyota.ntt%40gmail.com
>>
>> (To recap: In a replication set using archive, startup tries to
>> restore WAL files from archive before checking pg_wal directory for
>> the desired file. The behavior itself is intentionally designed and
>> reasonable. However, the restore code notifies of a restored file
>> regardless of whether it has been already archived or not. If
>> archive_command is written so as to return error for overwriting as we
>> suggest in the documentation, that behavior causes archive failure.)
>>
>> After playing with this, I see the problem just by restarting a
>> standby even in a simple archive-replication set after making
>> not-special prerequisites. So I think this is worth fixing.
>>
>> With this patch, KeepFileRestoredFromArchive compares the contents of
>> just-restored file and the existing file for the same segment only
>> when:
>>
>> - archive_mode = always
>> and - the file to restore already exists in pgwal
>> and - it has a .done and/or .ready status file.
>>
>> which doesn't happen usually. Then the function skips archive
>> notification if the contents are identical. The included TAP test is
>> working both on Linux and Windows.
>
>
> Thank you for the analysis and the patch.
> I'll try the patch tomorrow.
>
> I just noticed that this thread is still tied to another thread
> (it's not an independent thread). To fix that, it may be better to
> create a new thread again.

I've tried your patch. Unfortunately, it didn't seem to have any good
effect on the script I sent to reproduce the problem.

I understand that, as Stefan says, the test and cp commands have
problems and should not be used for archive commands. Maybe this is not
a big problem for the community.
Nevertheless, even if we do not improve the feature, I think it is a
good idea to explicitly state in the documentation that archiving may
fail under certain conditions for new users.

I'd like to hear the opinions of experts on the archive command.

P.S.
My customer's problem has already been solved, so it's ok. I've
emailed -hackers with the aim of preventing users from encountering
the same problem.

Regards,
Tatsuro Yamada

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Quan Zongliang 2021-06-08 09:32:37 Remove unused code from the KnownAssignedTransactionIdes submodule
Previous Message houzj.fnst@fujitsu.com 2021-06-08 09:12:31 RE: Parallel INSERT SELECT take 2