Re: Suggestion for --truncate-tables to pg_restore

From: "Karl O(dot) Pinc" <kop(at)meme(dot)com>
To: Josh Kupershmidt <schmiddy(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Suggestion for --truncate-tables to pg_restore
Date: 2012-11-21 10:48:56
Message-ID: 1353494936.3737.0@mofo
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Josh,

On 11/20/2012 11:53:23 PM, Josh Kupershmidt wrote:
> Hi Karl,
> I signed on to review this patch for the current CF.

I noticed. Thanks very much.

> On Fri, Sep 21, 2012 at 8:54 AM, Karl O. Pinc <kop(at)meme(dot)com> wrote:
> > On 09/20/2012 12:24:49 PM, Karl O. Pinc wrote:

> > First, the problem:
> >
> > Begin with the following structure:
> >
> > CREATE TABLE schemaA.foo (id PRIMARY KEY, data INT);
> >
> > CREATE VIEW schemaB.bar AS SELECT * FROM schemaA.foo;
> >
> > Then, by accident, somebody does:
> >
> > UPDATE schemaA.foo SET data = data + (RANDOM() * 1000)::INT;
> >
> > So, you want to restore the data into schemaA.foo.
> > But schemaA.foo has (bad) data in it that must first
> > be removed. It would seem that using
> >
> > pg_restore --clean -n schemaA -t foo my_pg_dump_backup
> >
> > would solve the problem, it would drop schemaA.foo,
> > recreate it, and then restore the data. But this does
> > not work. schemaA.foo does not drop because it's
> > got a dependent database object, schemaB.bar.

> TBH, I didn't find the example above particularly compelling for
> demonstrating the need for this feature. If you've just got one table
> with dependent views which needs to be restored, it's pretty easy to
> manually TRUNCATE and have pg_restore --data-only reload the table.
> (And easy enough to combine the truncate and restore into a single
> transaction in case anything goes wrong, if need be.)

I was not clear in stating the problem. (But see below
regarding foreign keys.) The dependent view
was an example, but there can also be REFERENCES constraints and
trigger related constraints involving other tables in other schemas.
The easiest work-around I can think of here is destroying all the
triggers and constraints, either everything in the whole db
or doing some work to be more selective, truncating all the schema's
tables. doing a data-only restore of the
schema, and then pg_restore --data-only, and then re-creating
all the triggers and constraints.

>
> But I'm willing to grant that this proposed feature is potentially as
> useful as existing restore-jiggering options like --disable-triggers.
> And I guess I could see that if you're really stuck having to perform
> a --data-only restore of many tables, this feature could come in
> handy.

I think so. See above.

>
> > So, the idea here is to be able to do a data-only
> > restore, first truncating the data in the tables
> > being restored to remove the existing corrupted data.
> >
> > The proposal is to add a --truncate-tables option
> > to pg_restore.
> >
> > ----

> > (I tested pg_restore with 9.1 and when --data-only is
> > used --clean is ignored, it does not even produce a warning.
> > This is arguably a bug.)
>
> +1 for having pg_restore bail out with an error if the user specifies
> --data-only --clean. By the same token, --clean and --truncate-tables
> together should also raise a "not allowed" error.

OT:
After looking at the code I found a number of "conflicting"
option combinations are not tested for or rejected. I can't
recall what they are now. The right way to fix these would be
to send in a separate patch for each "conflict" all attached
to one email/commitfest item? Or one patch that just gets
adjusted until it's right?

>
> > ----
> >
> > More serious objections were raised regarding semantics.
> >
> > What if, instead, the initial structure looked like:
> >
> > CREATE TABLE schemaA.foo
> > (id PRIMARY KEY, data INT);
> >
> > CREATE TABLE schemaB.bar
> > (id INT CONSTRAINT "bar_on_foo" REFERENCES foo
> > , moredata INT);
> >
> > With a case like this, in most real-world situations, you'd
> > have to use pg_restore with --disable-triggers if you wanted
> > to use --data-only and --truncate-tables. The possibility of
> > foreign key referential integrity corruption is obvious.
>
> Why would --disable-triggers be used in this example? I don't think
> you could use --truncate-tables to restore only table "foo", because
> you would get:
>
> ERROR: cannot truncate a table referenced in a foreign key
> constraint
>
> (At least, I think so, not having tried with the actual patch.) You
> could use --disable-triggers when restoring "bar".

I tried it and you're quite right. (I thought I'd tried this
before, but clearly I did not -- proving the utility of the review
process. :-) My assumption was that since triggers
were turned off that constraints, being triggers, would be off as
well and therefore tables with foreign keys could be truncated.
Obviously not, since the I get the above error.

I just tried it. --disable-triggers does not disable constraints.

>
> > Recovering some data and being left with referential
> > integrity problems is better than having no data.
>
> Well, this is really a judgment call, and I have a hunch that many
> admins would actually choose "none of the above". And I think this
> point gets to the crux of whether the --truncate-tables option will
> be
> useful as-is.

Well, yes. "None of the above" is best. :) In my case I had munged
the data in the one important schema and wanted to restore. The
setting is an academic one and a lot of cruft gets left laying
around in the other schemas, some of which consist entirely of cruft.
Responsibility for the non-main schema content is distributed.

In the interest of getting things working again quickly I wished to
restore the important schema quickly, and didn't want to have to sort
through the cruft ahead of time. In other words, I was entirely
willing to break things and pick up the pieces afterwords.

>
> For your first example, --truncate-tables seems to have some use, so
> that the admin isn't forced to recreate various views which may be
> dependent on the table. (Though it's not too difficult to work around
> this case today.)

As an aside: I never have an issue with this, having planned ahead.
I'm curious what the not-too-difficult work-around is that you have
in mind. I'm not coming up with a tidy way to, e.g, pull a lot
of views out of a dump.

> For the second example involving FKs, I'm a little bit more hesitant
> about endorsing the use of --truncate-tables combined with
> --disable-triggers to potentially allow bogus FKs. I know this is
> possible to some extent today using the --disable-triggers option,
> but
> it makes me nervous to be adding a mode to pg_restore if it's
> primarily designed to work together with --disable-triggers as a
> larger foot-gun.

This is the crux of the issue, and where I was thinking of
taking this patch. I very often am of the mindset that
foreign keys are no more or less special than other
more complex data integrity rules enforced with triggers.
(I suppose others might say the same regards to integrity
enforced at the application level.) This naturally
inclines me to think that one more way, beyond
--disable-triggers, to break integrity is no big deal.

But I quite see your point. Is it possible to get
resolution on this issue before either of us do any
more work in the direction of foreign keys?

An additional foot-gun, --disable-constraints,
seems like the natural progression in this direction.
Constraints, unlike triggers (?), can, in theory,
be fired at any time to check data content so perhaps
providing a way to test existing constraints against
db content would be a way to mitigate the foot-gun-ness
and drive an after-the-fact data cleanup process.

--disable-constraints seems like an entirely separate
patch so maybe we can stop the FK related issue right
here? (Although I would appreciate feedback regards
whether such an option might be accepted, at minimum
I'd like to get this out of my brain.)

>
> Just a few initial comments about the doc portion of the patch:
>
> + This option is only relevant when performing a data-only
>
> If we are going to restrict the --truncate-tables option to be used
> with --data-only, "only allowed" may be more clear than "only
> relevant".

Make sense to me. My intent here was to use the language
used elsewhere in the docs but I'm happy to go in a better direction.
(And perhaps later submit more patches to move other parts of the docs
in that direction.)

>
> + <para>
> + The <option>--disable-triggers</option> will almost always
> + always need to be used in conjunction with this option to
> + disable check constraints on foreign keys.
> + </para>
>
> For the first example you posted, of a view dependent on a table
> which
> needed to be restored, this advice would not be accurate. IMO it's a
> little dangerous advising users to "almost always" use a foot-gun
> like
> --disable-triggers.

Sorta brings us back to the sticking point above....
And sounds like, at the moment at least, this paragraph can
be deleted.

> I'm out of time for today, and I haven't had a chance to actually try
> out the patch, but I wanted to send off my thoughts so far. I should
> have a chance for another look later this week.

Thanks for the work.

I'll hold off on submitting any revisions to the patch for the moment.

One thing you'll want to pay attention to is the point
in the restore process at which the truncation is done.
In the current version each table is truncated immediately
before being copied. It might or might not be better to do
all the truncation up-front, in the fashion of --clean.
I would appreciate some guidance on this.

In case it's helpful I'm attaching two files
I used to test the foreign key issue.

Regards,

Karl <kop(at)meme(dot)com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein

Attachment Content-Type Size
pg_restore_test.sql text/x-sql 391 bytes
pg_restore_test.sh application/x-shellscript 206 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2012-11-21 11:08:00 Re: [PATCH] binary heap implementation
Previous Message Boszormenyi Zoltan 2012-11-21 10:35:00 Re: [PATCH] Make pg_basebackup configure and start standby [Review]