Re: pg_dump, pg_dumpall and data durability

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_dump, pg_dumpall and data durability
Date: 2016-11-09 00:14:17
Message-ID: CAB7nPqRT8YWXxb-MNG6+fxVYhcjNKjMK_jVqADJvcuvVaSNs=Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Nov 9, 2016 at 8:18 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> First question: Do we even want this? Generally, when a program
>> writes to a file, we rely on the operating system to decide when that
>> data should be written back to disk. We have to override that
>> distinction for things internal to PostgreSQL because we need certain
>> bits of data to reach the disk in a certain order, but it's unclear to
>> me how far outside the core database system we want to extend that.
>> Are we going to have psql fsync() data it writes to a file when \o is
>> used, for example? That would seem to me to be beyond insane, because
>> we have no idea whether the user actually needs that file to be
>> durable. It is a better bet that a pg_dump command's output needs
>> durability, of course, but I feel that we shouldn't just go disabling
>> the filesystem cache one program at a time without some guiding
>> principle.
>
> FWIW, I find the premise pretty dubious. People don't normally expect
> programs to fsync their standard output, and the argument that pg_dump's
> output is always critical data doesn't withstand inspection. Also,
> I don't understand what pg_dump should do if it fails to fsync. There
> are too many cases where that would fail (eg, output piped to a program)
> for it to be treated as an error condition. But if it isn't reported as
> an error, then how much durability guarantee are we really adding?

If the output is piped to a program, there is no way to guarantee that
the data will be flushed and the user is responsible for that. We
cannot control all the use cases. The same applies for example with
pg_basebackup where the data is sent to stdout. IMO, the limit set is
that tools aimed at taking physical and logical backups should do a
better effort in the cases where they can do it. That's a cheap
insurance.

Based on this past thread, it seems to me that Magnus, Andres and Jim
Nasby are of the opinion that making things is useful:
https://www.postgresql.org/message-id/20160327233033.GD20662@awork2.anarazel.de
And so do I.

> I think this might be better addressed by adding something to backup.sgml
> pointing out that you'd better fsync or sync your backups before assuming
> that they can't be lost.

Perhaps. That would be better than nothing at least, but that won't
help for cases where we can help a bit.
--
Michael

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2016-11-09 00:27:55 Re: WAL logging problem in 9.4.3?
Previous Message Michael Paquier 2016-11-09 00:07:11 Re: pg_dump, pg_dumpall and data durability