Re: Race condition in recovery?

From: Tatsuro Yamada <tatsuro(dot)yamada(dot)tf(at)nttcom(dot)co(dot)jp>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: robertmhaas(at)gmail(dot)com, dilipbalaut(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, hlinnaka(at)iki(dot)fi
Subject: Re: Race condition in recovery?
Date: 2021-05-31 02:52:05
Message-ID: 4698027d-5c0d-098f-9a8e-8cf09e36a555@nttcom.co.jp_1
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Horiguchi-san,

> (Why me?)

Because the story was also related to PG-REX, which you are
also involved in developing. Perhaps off-list instead of
-hackers would have been better, but I emailed -hackers because
the same problem could be encountered by PostgreSQL users who
do not use PG-REX.


>> In a project I helped with, I encountered an issue where
>> the archive command kept failing. I thought this issue was
>> related to the problem in this thread, so I'm sharing it here.
>> If I should create a new thread, please let me know.
>>
>> * Problem
>> - The archive_command is failed always.
>
> Although I think the configuration is a kind of broken, it can be seen
> as it is mimicing the case of shared-archive, where primary and
> standby share the same archive directory.

To be precise, the environment of this reproduction script is
different from our actual environment. I tried to make it as
simple as possible to reproduce the problem.
(In order to make it look like the actual environment, you have
to build a PG-REX environment.)

A simple replication environment might be enough, so I'll try to
recreate a script that is closer to the actual environment later.


> Basically we need to use an archive command like the following for
> that case to avoid this kind of failure. The script returns "success"
> when the target file is found but identical with the source file. I
> don't find such a description in the documentation, and haven't
> bothered digging into the mailing-list archive.
>
> ==
> #! /bin/bash
>
> if [ -f $2 ]; then
> cmp -s $1 $2
> if [ $? != 0 ]; then
> exit 1
> fi
> exit 0
> fi
>
> cp $1 $2
> ==

Thanks for your reply.
Since the above behavior is different from the behavior of the
test command in the following example in postgresql.conf, I think
we should write a note about this example.

# e.g. 'test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f'

Let me describe the problem we faced.
- When archive_mode=always, archive_command is (sometimes) executed
in a situation where the history file already exists on the standby
side.

- In this case, if "test ! -f" is written in the archive_command of
postgresql.conf on the standby side, the command will keep failing.

Note that this problem does not occur when archive_mode=on.

So, what should we do for the user? I think we should put some notes
in postgresql.conf or in the documentation. For example, something
like this:

====
Note: If you use archive_mode=always, the archive_command on the standby side should not be used "test ! -f".
====

Regards,
Tatsuro Yamada

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dean Gibson (DB Administrator) 2021-05-31 03:07:29 Re: AWS forcing PG upgrade from v9.6 a disaster
Previous Message Amit Kapila 2021-05-31 02:51:13 Re: Decoding speculative insert with toast leaks memory