Re: pg_dump --snapshot

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)2ndquadrant(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_dump --snapshot
Date: 2013-05-07 13:27:30
Message-ID: 20130507132729.GQ4361@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

* Simon Riggs (simon(at)2ndQuadrant(dot)com) wrote:
> On 7 May 2013 01:18, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> > * Simon Riggs (simon(at)2ndQuadrant(dot)com) wrote:
> >> If anybody really wanted to fix pg_dump, they could do. If that was so
> >> important, why block this patch, but allow parallel pg_dump to be
> >> committed without it?
> >
> > Because parallel pg_dump didn't make the problem any *worse*..? This
> > does.
>
> Sorry, not accurate. Patch makes nothing *worse*.

I really don't understand that claim. It clearly increases the time
between transaction start and when the locks are all acquired. If this
is used programatically (instead of someone hand-typing commands into
psql), perhaps it doesn't increase it very much, but increasing it at
all is clearly going in the "worse" direction.

> The existing API *can* be misused in the way you say, and so also
> could pg_dump if the patch is allowed.

And if the patch isn't allowed, then pg_dump isn't any worse off than
it's always been. The corrollary is that pg_dump *is* worse off if the
patch goes in. The patch is absolutely *encouraging* such misuse of
this API because there's almost no other way to use this new option to
pg_dump except as an abuse of the API.

> However, there is no reason to suppose that such misuse would be
> common;

The concern is exactly this point. It wouldn't be simply common, were
this patch applied and the option used, it would be the predominant
case and tools would be built which either completely ignore the problem
(most likely) or which attempt to duplicate what pg_dump already does,
likely very poorly.

> no reason why a timing gap would *necessarily* occur in the
> way your previous example showed, or if it did why it would
> necessarily present a problem for the user. Especially if we put
> something in the docs.

I don't believe papering over this with documentation is really a
solution. I did suggest a few other options which don't seem to be
getting much traction from anyone, so perhaps they're not workable, but
I'll reiterate them for fun anyway:

Provide a way for users (perhaps even pg_dump) to acquire locks on
essentially everything in the DB. This could be a plpgsql function or
perhaps even a new lock type. Grander ideas might also support the
filtering options which pg_dump supports.

Have pg_dump output the snapshot which it has acquired and shared, for
others to then connect and use. It would do this after acquiring all of
the locks, of course.

> Snapshots have no WARNING on them.

The documentation is certainly fallible, though this is certainly a
problem which all applications need to address in some way. We address
it as best we can in pg_dump today and the goal is to continue to do so.

> There is no guarantee in any
> transaction that the table you want will not be dropped before you try
> to access it.

Hence why we do the best we can in pg_dump by immediately locking the
objects, once we've identified what they are.

> > However,
> > there really isn't a way for users to use this new option correctly-
>
> Not accurate.

Then please articulate how they would use it correctly? Would they
write a function which goes and acquires all of the locks which pg_dump
would (assuming they've figured out what locks pg_dump needs), prior to
exporting the snapshot and calling pg_dump? If so, perhaps we could
simplify their lives by providing such a function for them to use
instead of asking each user to write it?

> The argument against this is essentially that we don't trust the user
> to use it well, so we won't let them have it at all. Which makes no
> sense since they already have this API and don't need our permission
> to use it. All that blocking this patch does is to remove any chance
> the user has of coordinating pg_dump with other actions; preventing
> that causes more issues for the user and so doing nothing is not a
> safe or correct either. A balanced viewpoint needs to include the same
> level of analysis on both sides, not just a deep look at the worst
> case on one side and claim everything is rosy with the current
> situation.

I'm not sure where the claim that everything is rosy was made; I've
certainly not seen it nor made any claim that it is. This whole
discussion started with Tom pointing out that pg_dump is essentially
already busted and that this change would end up making that situation
worse. Perhaps in very select cases users will do the right thing and
acquire all the locks they need before exporting the snapshot and
calling pg_dump, but that description of how to use this option
correctly certainly wasn't articulated anywhere in the initial
description or contents of the patch. There's also nothing done to make
that any easier to do nor a way to move users using this system to a new
methodology when we figure out how to make pg_dump do this better, thus
putting us in a situation of *always* having this issue, even if we fix
it for 'normal' pg_dump.

Thanks,

Stephen

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2013-05-07 13:40:41 Re: Failing start-up archive recovery at Standby mode in PG9.2.4
Previous Message Stephen Frost 2013-05-07 12:54:54 Re: pg_dump --snapshot