Re: [patch] pg_copy - a command for reliable WAL archiving

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [patch] pg_copy - a command for reliable WAL archiving
Date: 2014-08-20 23:10:53
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-08-20 18:58:05 -0400, Bruce Momjian wrote:
> On Wed, Aug 20, 2014 at 10:36:40AM -0400, Tom Lane wrote:
> > Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> > > On 2014-08-20 10:19:33 -0400, Tom Lane wrote:
> > >> Alternatively, you could use the process PID as part of the temp file
> > >> name; which is probably a good idea anyway.
> >
> > > I think that's actually worse, because nothing will clean up those
> > > unless you explicitly scan for all <whatever>.$pid files, and somehow
> > > kill them.
> >
> > True. As long as the copy command is prepared to get rid of a
> > pre-existing target file, using a fixed .tmp extension should be fine.
> Well, then we are back to this comment by MauMau:

> > With that said, copying to a temporary file like <dest>.tmp and
> > renaming it to <dest> sounds worthwhile even as a basic copy utility.
> > I want to avoid copying to a temporary file with a fixed name like
> > _copy.tmp, because some advanced utility may want to run multiple
> > instances of pg_copy to copy several files into the same directory
> > simultaneously. However, I'm afraid multiple <dest>.tmp files might
> > continue to occupy disk space after canceling copy or power failure in
> > some use cases, where the copy of the same file won't be retried.
> > That's also the reason why I chose to not use a temporary file like
> > cp/copy.
> Do we want cases where the same directory is used multiple pg_copy
> processes? I can't imagine how that setup would make sense.

I don't think anybody is proposing the _copy.tmp proposal. We've just
argued about the risk of <dest>.tmp. And I argued - and others seem to
agree - the space usage problem isn't really relevant because archive
commands and such are rerun after failure and can then clean up the temp
file again.

> I am thinking pg_copy should emit a warning message when it removes an
> old temp file. This might alert people that something odd is happening
> if they see the message often.

Don't really see a point in this. If the archive command or such failed,
that will already have been logged. I'd expect this to be implemented by
passing O_CREAT | O_TRUNC to open(), nothing else.

> The pid-extension idea would work as pg_copy can test all pid extension
> files to see if the pid is still active. However, that assumes that the
> pid is running on the local machine and not on another machines that has
> NFS-mounted this directory, so maybe this is a bad idea, but again, we
> are back to the idea that only one process should be writing into this
> directory.

I don't actually think we should assume that. There very well could be
one process running an archive command, using differently prefixed file
names or such.


Andres Freund

Andres Freund
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2014-08-20 23:24:20 Re: [PATCH] Incremental backup: add backup profile to base backup
Previous Message Alvaro Herrera 2014-08-20 23:10:40 Re: Minmax indexes