Re: pg_dump --snapshot

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Simon Riggs <simon(at)2ndQuadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_dump --snapshot
Date: 2013-05-07 15:01:48
Message-ID: 20130507150148.GV4361@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

* Andres Freund (andres(at)2ndquadrant(dot)com) wrote:
> On 2013-05-07 08:54:54 -0400, Stephen Frost wrote:
> > Agreed- but it also isn't currently possible to make it any smaller.
>
> Uh. Why not? I think this is what needs to be fixed instead of making
> the hole marginally smaller elsewhere.

If we're able to fix it- how would we allow users to take advantage of
that fix when starting their own snapshot and feeding it to pg_dump?

> You can trivially reproduce the
> problem with pg_dump today:

Agreed. The idea was to simply avoid making it worse. Your argument
seems to be that it's already horrible and easily broken and therefore
we can go ahead and make it worse and no one will complain because no
one has complained about how bad it is already. I don't follow that.

> > My suggestion was to lock everything that pg_dump locks, which we
> > clearly have locks for since pg_dump is acquiring them. Also, I don't
> > believe it'd be that difficult to identify what pg_dump would lock, at
> > least in a 'default' whole-database run. This is more of a stop-gap
> > than a complete solution.
>
> The problem is that locking - as shown above - doesn't really help all
> that much.

It helps in that once we have the lock, things aren't changing under us.
The closer we can keep that to when the transaction starts, the better..

> You would have to do it like:
> 1) start txn
> 2) acquire DDL prevention lock
> 3) assert we do not yet have a snapshot
> 4) acquire snapshot
> 5) lock objects
> 6) release DDL lock
> 7) dump objects/data
> 8) commit txn

We'd need a '4.5' to get the list of objects, no?

> Unfortunately most of these steps cannot easily/safely exposed to
> sql. And again, this is a very old situation, that doesn't really have
> to do anything with snapshot exporting.

Perhaps we can't expose the individual components, but perhaps we could
have a function which does some/all of the pieces (except the pieces
that must be done by pg_dump) and which users need to grant explicit
execute rights to for their "backup" user?

> pg_dump doesn't prevent you from running CREATE TEMPORARY TABLE? That
> would make it unrunnable in many situations. Especially as we cannot
> easily (without using several connections at once) release locks before
> ending a transaction.

I agree that we wouldn't want pg_dump preventing all catalog updates,
but wouldn't we be able to draw a distinction between objects being
modified and objects being added?

> > It's really a problem for just about everything that uses transactions
> > and locking, isn't it? pg_dump just happens to have it worst since it
> > wants to go and touch every object in the database. It's certainly
> > possible for people to connect to the DB, look at pg_class and then try
> > to access some object which no longer exists (even though it's in
> > pg_class).
>
> Well, normal sql shouldn't need to touch pg_class and will know
> beforehand which locks it will need. But I have to say I more than once
> wished we would throw an error if an objects definition is "newer" than
> the one we started out with.

I agree- that would be nice. I'd also contend that 'normal' SQL quite
often looks at pg_class; I know that we have tons of code which does.
We also basically lock all users out of the database when we're doing
DDL changes to avoid any issues, but it'd certainly be nice if we didn't
have to..

> > This will be an interesting thing to consider when
> > implementing MVCC for the catalog.
>
> I think using proper mvcc snapsot for catalog scans doesn't, cannot even
> in all case, imply having to use the user's transaction's snapshot, just
> one that guarantees a consistent result while a query is running.

The hope would be to see a consistent view of what you can access while
the transaction is running..

> I am not sure if the correct fix is locking and not just making sure the
> definition of objects hasn't changed since the snapshot started.

I'm guessing that the intent of the locking is to make sure that the
objects don't change under us. :) The problem, as you've explained, is
that the object might change before we get our lock in place. That
window of opportunity gets a lot larger when it's moved outside of our
control.

> But if
> we go for locking creating a function which makes sure that the source
> transaction has a certain strong lock wouldn't be that hard. We have all
> the data for it.

Agreed.

> a) exporting a snapshot to a file was discussed and deemed unacceptable
> risky. That's why pg_export_snapshot() exports it itself into some
> internal format somewhere. The user doesn't need to know where/how.

It wasn't my intent to export the 'snapshot' to a file but rather to
simply dump out what pg_dump gets when it runs pg_export_snapshot(),
allowing others to connect in and use that snapshot.

> b) When importing a snapshot the source transaction needs to be alive,
> otherwise we cannot guarantee the global xmin hasn't advanced too
> much. That would open up very annoying race-conditions because pg_dump
> would need to live long enough for the separate transaction to import
> data.

Yes, there's some race-condition risk here, but I don't think that's
insoluble (wait till the file disappears before releasing the
connection?). I agree that it might not be pretty either.

> c) Quite possibly the snapshot we need needs to meet some special
> criterions that pg_dump cannot guarantee.

That's a much more difficult case, of course. It would help to have a
better understanding of what, exactly, Simon's use-case for this feature
is to answer if this is an issue or not.

Thanks,

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2013-05-07 15:13:38 XLogFlush invoked about twice as many times after 9.2 group commit enhancement
Previous Message Andres Freund 2013-05-07 14:15:26 Re: pg_dump --snapshot