Re: pg_dump --snapshot

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Simon Riggs <simon(at)2ndQuadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_dump --snapshot
Date: 2013-05-07 14:15:26
Message-ID: 20130507141526.GA6117@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2013-05-07 08:54:54 -0400, Stephen Frost wrote:
> * Andres Freund (andres(at)2ndquadrant(dot)com) wrote:
> > > All of which I
> > > think I agree with, but I don't agree with the conclusion that this
> > > larger window is somehow acceptable because there's a very small window
> > > (one which can't be made any smaller, today..) which exists today.
> >
> > The window isn't that small currently:
>
> Agreed- but it also isn't currently possible to make it any smaller.

Uh. Why not? I think this is what needs to be fixed instead of making
the hole marginally smaller elsewhere. You can trivially reproduce the
problem with pg_dump today:

S1:
$ psql postgres
=# CREATE DATABASE pgdumptest;
=# CREATE DATABASE pgrestoretest;
=# \c pgdumptest
=# CREATE TABLE tbl(id serial primary key, data_a int8, data_b float8);
=# INSERT INTO tbl(data_a, data_b) SELECT random()::int, random() FROM generate_series(1, 10);
=# BEGIN;
=# ALTER TABLE tbl RENAME COLUMN data_a TO data_swap;
=# ALTER TABLE tbl RENAME COLUMN data_b TO data_a;
=# ALTER TABLE tbl RENAME COLUMN data_swap TO data_b;

S2:
$ pg_dump pgdumptest > /tmp/pg_dump.sql

S1:
=# COMMIT;

S2:
$ psql pgrestoretest -f /tmp/pgdump.sql
psql:/tmp/pgdump.sql:87: ERROR: invalid input syntax for integer: "0.944006722886115313"
CONTEXT: COPY tbl, line 1, column data_a: "0.944006722886115313"

A ddl upgrade script taking some seconds isn't exactly anything
unusual...

> > > Alright, then let's provide a function which will do that and tell
> > > people to use it instead of just using pg_export_snapshot(), which
> > > clearly doesn't do that.
> >
> > If it were clear cut what to lock and we had locks for
> > everything. Maybe. But we don't have locks for everything.
>
> My suggestion was to lock everything that pg_dump locks, which we
> clearly have locks for since pg_dump is acquiring them. Also, I don't
> believe it'd be that difficult to identify what pg_dump would lock, at
> least in a 'default' whole-database run. This is more of a stop-gap
> than a complete solution.

The problem is that locking - as shown above - doesn't really help all
that much. You would have to do it like:
1) start txn
2) acquire DDL prevention lock
3) assert we do not yet have a snapshot
4) acquire snapshot
5) lock objects
6) release DDL lock
7) dump objects/data
8) commit txn

Unfortunately most of these steps cannot easily/safely exposed to
sql. And again, this is a very old situation, that doesn't really have
to do anything with snapshot exporting.

> > So we would
> > need to take locks preventing any modification on any of system catalogs
> > which doesn't really seem like a good thing, especially as we can't
> > release them from sql during the dump were we can allow creation of
> > temp tables and everything without problems.
>
> That's already an issue when pg_dump runs, no? Not sure why this is
> different.

pg_dump doesn't prevent you from running CREATE TEMPORARY TABLE? That
would make it unrunnable in many situations. Especially as we cannot
easily (without using several connections at once) release locks before
ending a transaction.

> > > I believe the main argument here is really around "you should think
> > > about these issues before just throwing this in" and not "it must be
> > > perfect before it goes in". Perhaps "it shouldn't make things *worse*
> > > than they are now" would also be apt..
> >
> > That's not how I read 8465(dot)1367860037(at)sss(dot)pgh(dot)pa(dot)us :(
>
> I believe the point that Tom is making is that we shouldn't paint
> ourselves into a corner by letting users provide old snapshots to
> pg_dump which haven't acquired any of the necessary locks. The goal, at
> least as I read it, is to come up with a workable design (and I don't
> know that we have, but still) which provides a way for the locks to be
> taken at least as quickly as what pg_dump does today and which we could
> modify down the road to take the locks pre-snapshot (presuming we can
> figure out a way to make that work).

> The proposed patch certainly doesn't make any attempt to address that
> issue and would encourage users to open themselves up to this risk more
> than they are exposted today w/ pg_dump.

I fail to see a difference that is big enough to worry overly much
about. The above problem is easy enough to encounter without any
snapshot exporting and I can't remember a single report.

> > I think there is no point in fixing it somewhere else. The problem is in
> > pg_dump, not the snapshot import/export.
>
> It's really a problem for just about everything that uses transactions
> and locking, isn't it? pg_dump just happens to have it worst since it
> wants to go and touch every object in the database. It's certainly
> possible for people to connect to the DB, look at pg_class and then try
> to access some object which no longer exists (even though it's in
> pg_class).

Well, normal sql shouldn't need to touch pg_class and will know
beforehand which locks it will need. But I have to say I more than once
wished we would throw an error if an objects definition is "newer" than
the one we started out with.

> This will be an interesting thing to consider when
> implementing MVCC for the catalog.

I think using proper mvcc snapsot for catalog scans doesn't, cannot even
in all case, imply having to use the user's transaction's snapshot, just
one that guarantees a consistent result while a query is running.

> > You did suggest how it can be fixed? You mean
> > 20130506214515(dot)GL4361(at)tamriel(dot)snowman(dot)net?
>
> I suggested how it might be done. :) There's undoubtably issues with an
> all-database-objects lock, but it would certainly reduce the time
> between transaction start and getting all the locks acquired and shrink
> the window that much more. If we did implement such a beast, how could
> we ensure that the locks were taken immediately after transaction start
> if the snapshot is being passed to pg_dump? Basically, if we *did*
> solve this issue for pg_dump in some way in the future, how would we use
> it if pg_dump can accept an outside snapshot?

I am not sure if the correct fix is locking and not just making sure the
definition of objects hasn't changed since the snapshot started. But if
we go for locking creating a function which makes sure that the source
transaction has a certain strong lock wouldn't be that hard. We have all
the data for it.

> One other thought did occur to me- we could simply have pg_dump export
> the snapshot that it gets to stdout, a file, whatever, and systems which
> are trying to do this magic "everyone gets the same view" could glob
> onto the snapshot created by pg_dump, after all the locks have been
> acquired..

Several problems:
a) exporting a snapshot to a file was discussed and deemed unacceptable
risky. That's why pg_export_snapshot() exports it itself into some
internal format somewhere. The user doesn't need to know where/how.
b) When importing a snapshot the source transaction needs to be alive,
otherwise we cannot guarantee the global xmin hasn't advanced too
much. That would open up very annoying race-conditions because pg_dump
would need to live long enough for the separate transaction to import
data.
c) Quite possibly the snapshot we need needs to meet some special
criterions that pg_dump cannot guarantee.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2013-05-07 15:01:48 Re: pg_dump --snapshot
Previous Message Heikki Linnakangas 2013-05-07 14:00:15 Re: [BUGS] BUG #8043: 9.2.4 doesn't open WAL files from archive, only looks in pg_xlog