Re: [patch] pg_copy - a command for reliable WAL archiving

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [patch] pg_copy - a command for reliable WAL archiving
Date: 2014-08-20 23:10:53
Message-ID: 20140820231053.GA9835@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2014-08-20 18:58:05 -0400, Bruce Momjian wrote:
> On Wed, Aug 20, 2014 at 10:36:40AM -0400, Tom Lane wrote:
> > Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> > > On 2014-08-20 10:19:33 -0400, Tom Lane wrote:
> > >> Alternatively, you could use the process PID as part of the temp file
> > >> name; which is probably a good idea anyway.
> >
> > > I think that's actually worse, because nothing will clean up those
> > > unless you explicitly scan for all <whatever>.$pid files, and somehow
> > > kill them.
> >
> > True. As long as the copy command is prepared to get rid of a
> > pre-existing target file, using a fixed .tmp extension should be fine.
>
> Well, then we are back to this comment by MauMau:

> > With that said, copying to a temporary file like <dest>.tmp and
> > renaming it to <dest> sounds worthwhile even as a basic copy utility.
> > I want to avoid copying to a temporary file with a fixed name like
> > _copy.tmp, because some advanced utility may want to run multiple
> > instances of pg_copy to copy several files into the same directory
> > simultaneously. However, I'm afraid multiple <dest>.tmp files might
> > continue to occupy disk space after canceling copy or power failure in
> > some use cases, where the copy of the same file won't be retried.
> > That's also the reason why I chose to not use a temporary file like
> > cp/copy.
>
> Do we want cases where the same directory is used multiple pg_copy
> processes? I can't imagine how that setup would make sense.

I don't think anybody is proposing the _copy.tmp proposal. We've just
argued about the risk of <dest>.tmp. And I argued - and others seem to
agree - the space usage problem isn't really relevant because archive
commands and such are rerun after failure and can then clean up the temp
file again.

> I am thinking pg_copy should emit a warning message when it removes an
> old temp file. This might alert people that something odd is happening
> if they see the message often.

Don't really see a point in this. If the archive command or such failed,
that will already have been logged. I'd expect this to be implemented by
passing O_CREAT | O_TRUNC to open(), nothing else.

> The pid-extension idea would work as pg_copy can test all pid extension
> files to see if the pid is still active. However, that assumes that the
> pid is running on the local machine and not on another machines that has
> NFS-mounted this directory, so maybe this is a bad idea, but again, we
> are back to the idea that only one process should be writing into this
> directory.

I don't actually think we should assume that. There very well could be
one process running an archive command, using differently prefixed file
names or such.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2014-08-20 23:24:20 Re: [PATCH] Incremental backup: add backup profile to base backup
Previous Message Alvaro Herrera 2014-08-20 23:10:40 Re: Minmax indexes