Re: pg_dump --snapshot

From: Greg Stark <stark(at)mit(dot)edu>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_dump --snapshot
Date: 2013-05-07 15:50:52
Message-ID: CAM-w4HPmUC0Nkan8wj7JyyxNZGYx5qKhkr3YuOKzmLdr04AybA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, May 7, 2013 at 1:54 PM, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> I believe the point that Tom is making is that we shouldn't paint
> ourselves into a corner by letting users provide old snapshots to
> pg_dump which haven't acquired any of the necessary locks. The goal, at
> least as I read it, is to come up with a workable design (and I don't
> know that we have, but still) which provides a way for the locks to be
> taken at least as quickly as what pg_dump does today and which we could
> modify down the road to take the locks pre-snapshot (presuming we can
> figure out a way to make that work).

One natural way to do it would be to make an option to pg_dump which
caused it to do all the normal pre-dump things it would normally do,
then export a snapshot and wait for the user. (Alternately it could
even create a prepared transaction which iirc keeps the locks until
it's committed). That gives users a way to get a snapshot that is
guaranteed to work until that transaction is exited.

But I don't think that would really make users happy. I think the
usual use case for this feature would be to dump a single table or
small number of tables as of some time in the past that they didn't
plan in advance that they would need. They might have a cron job
periodically export a snapshot "just in case" and then want to use it
later. They wouldn't be happy if they had to create a prepared
transaction for each such snapshot which locked every table in their
database until they decide they don't actually need it. That would
mean they could never do any ddl.

The use case of wanting to dump a single table as of a few hours ago
(e.g. before some application data loss bug) is pretty compelling. If
we could do it it I think it would be worth quite a bit.

What's the worst case for using an old snapshot? If I try to access a
table that doesn't exist any longer I'll get an error. That doesn't
really seem that bad for the use case I described. It's worse for the
full table dump but for an explicit list of tables, eh. Seems ok to
me.

If I try to access a table whose schema has changed then I might use
the wrong tupledesc and see rows that don't decode properly. That
would be a disaster. Can we protect against that by noticing that the
pg_class row isn't visible to our snapshot and throw an error? Would
that be sufficient to protect against all schema changes? Would it
cause massive false positives based on whether vacuum had happened to
have run recently?

--
greg

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2013-05-07 16:05:18 Re: pg_dump --snapshot
Previous Message Amit Langote 2013-05-07 15:13:38 XLogFlush invoked about twice as many times after 9.2 group commit enhancement