Re: Duplicate history file?

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: rjuju123(at)gmail(dot)com
Cc: sfrost(at)snowman(dot)net, tatsuro(dot)yamada(dot)tf(at)nttcom(dot)co(dot)jp, masao(dot)fujii(at)oss(dot)nttdata(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Duplicate history file?
Date: 2021-06-15 01:20:37
Message-ID: 20210615.102037.1074344094249330331.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Fri, 11 Jun 2021 15:18:03 +0800, Julien Rouhaud <rjuju123(at)gmail(dot)com> wrote in
> On Fri, Jun 11, 2021 at 03:32:28PM +0900, Kyotaro Horiguchi wrote:
> I disagree, cp is probably the worst command that can be used for this purpose.
> On top on the previous problems already mentioned, you also have the fact that
> the copy isn't atomic. It means that any concurrent restore_command (or
> anything that would consume the archived files) will happily process a half
> copied WAL file, and in case of any error during the copy you end up with a
> file for which you don't know if it contains valid data or not. I don't see
> any case where you would actually want to use that, unless maybe if you want to
> benchmark how long it takes before you lose some data.

Actually there's large room for losing data with cp. Yes, we would
need additional redundancy of storage and periodical integrity
inspection of the storage and archives on maybe need copies at the
other sites on the other side of the Earth. But they are too-much for
some kind of users. They have the right and responsibility to decide
how durable/reliable their archive needs to be. (Putting aside some
hardware/geological requirements :p) If we mandate some
characteristics on the archive_command, we should take them into core.
I remember I saw some discussions on archive command on this line but
I think it had ended at the point something like that "we cannot
design one-fits-all interface comforming the requirements" or
something (sorry, I don't remember in its detail..)

> I don't know, I'm assuming that barman also provides one, such as wal-e and
> wal-g (assuming that the distant providers do their part of the job correctly).

Well. rman used rsync/ssh in its documentation in the past and now
looks like providing barman-wal-archive so it seems that you're right
in that point. So, do we recommend them in our documentation? (I'm
not sure they are actually comform the requirement, though..)

> Maybe there are other tools too. But as long as we don't document what exactly
> are the requirements, it's not really a surprise that most people don't
> implement them.

I strongly agree to describe the requirements.

My point is that if all of them are really mandatory, it is mandatory
for us to officially provide or at least recommend the minimal
implement(s) that covers all of them. If we recommended some external
tools, that would mean that we ensure that the tools qualify the
requirements.

If we write an example with a pseudo tool name, requiring some
characteristics on the tool, then not telling about the minimal tools,
I think that it is equivalent to that we are inhibiting certain users
from using archive_command even if they really don't want such level
of durability.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2021-06-15 01:22:42 Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic
Previous Message Andres Freund 2021-06-15 01:07:12 Re: Different compression methods for FPI