Re: Race condition in recovery?

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, hlinnaka <hlinnaka(at)iki(dot)fi>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Race condition in recovery?
Date: 2021-05-25 15:46:05
Message-ID: CA+TgmoaqLWsviEk5jNbt_j3VMj55FPGzsjLRRWk+hn7vThxXZQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, May 23, 2021 at 12:08 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> I have created a tap test based on Robert's test.sh script. It
> reproduces the issue. I am new with perl so this still needs some
> cleanup/improvement, but at least it shows the idea.

Thanks. I think this is the right idea but just needs a few adjustments.

I don't think that dynamically writing out a file into the current
working directory of the script is the right approach. Instead I think
we should be planning to check this file into the repository and then
have the test script find it. Now the trick is how to do that in a
portable way. I think we can probably use the same idea that the
pg_rewind tests use to find a perl module located in the test
directory. That is:

use FindBin;

and then use $FindBin::RealBin to construct a path name to the executable, e.g.

$node_primary->append_conf(
'postgresql.conf', qq(
archive_command = '"$FindBin::RealBin/skip_cp" "%p" "$archivedir_primary/%f"'
));

This avoids issues such as: leaving behind files if the script is
terminated, needing the current working directory to be writable,
possible permissions issues with the new file under Windows or
SE-Linux.

The restore_command needs to be "cp" on Linux but "copy" on Windows.
Maybe you can use PostgresNode.pm's enable_restoring? Or if that
doesn't work, then you need to mimic the logic, as
src/test/recovery/t/020_archive_status.pl does for archive_command.

Why do you set log_line_prefix? Is that needed?

Why are the nodes called standby_1 and cascade? Either use standby and
cascade or standby_1 and standby_2.

There is a comment that says "Create some content on primary and check
its presence in standby 1" but it only creates the content, and does
not check anything. I think we don't really need to do any of this,
but at least the code and the comment have to match.

Let's not call the command skip_cp. It's not very descriptive. If you
don't like recalcitrant_cp, then maybe something like cp_history_files
or so.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Egor Rogov 2021-05-25 16:00:46 Re: automatic analyze: readahead - add "IO read time" log message
Previous Message Tom Lane 2021-05-25 15:37:25 Re: Test of a partition with an incomplete detach has a timing issue