Re: Race condition in recovery?

From: Tatsuro Yamada <tatsuro(dot)yamada(dot)tf(at)nttcom(dot)co(dot)jp>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: robertmhaas(at)gmail(dot)com, dilipbalaut(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, hlinnaka(at)iki(dot)fi
Subject: Re: Race condition in recovery?
Date: 2021-05-31 02:52:05
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Horiguchi-san,

> (Why me?)

Because the story was also related to PG-REX, which you are
also involved in developing. Perhaps off-list instead of
-hackers would have been better, but I emailed -hackers because
the same problem could be encountered by PostgreSQL users who
do not use PG-REX.

>> In a project I helped with, I encountered an issue where
>> the archive command kept failing. I thought this issue was
>> related to the problem in this thread, so I'm sharing it here.
>> If I should create a new thread, please let me know.
>> * Problem
>> - The archive_command is failed always.
> Although I think the configuration is a kind of broken, it can be seen
> as it is mimicing the case of shared-archive, where primary and
> standby share the same archive directory.

To be precise, the environment of this reproduction script is
different from our actual environment. I tried to make it as
simple as possible to reproduce the problem.
(In order to make it look like the actual environment, you have
to build a PG-REX environment.)

A simple replication environment might be enough, so I'll try to
recreate a script that is closer to the actual environment later.

> Basically we need to use an archive command like the following for
> that case to avoid this kind of failure. The script returns "success"
> when the target file is found but identical with the source file. I
> don't find such a description in the documentation, and haven't
> bothered digging into the mailing-list archive.
> ==
> #! /bin/bash
> if [ -f $2 ]; then
> cmp -s $1 $2
> if [ $? != 0 ]; then
> exit 1
> fi
> exit 0
> fi
> cp $1 $2
> ==

Thanks for your reply.
Since the above behavior is different from the behavior of the
test command in the following example in postgresql.conf, I think
we should write a note about this example.

# e.g. 'test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f'

Let me describe the problem we faced.
- When archive_mode=always, archive_command is (sometimes) executed
in a situation where the history file already exists on the standby

- In this case, if "test ! -f" is written in the archive_command of
postgresql.conf on the standby side, the command will keep failing.

Note that this problem does not occur when archive_mode=on.

So, what should we do for the user? I think we should put some notes
in postgresql.conf or in the documentation. For example, something
like this:

Note: If you use archive_mode=always, the archive_command on the standby side should not be used "test ! -f".

Tatsuro Yamada

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Dean Gibson (DB Administrator) 2021-05-31 03:07:29 Re: AWS forcing PG upgrade from v9.6 a disaster
Previous Message Amit Kapila 2021-05-31 02:51:13 Re: Decoding speculative insert with toast leaks memory