Re: Change in "policy" on dump ordering?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Michael Banck <michael(dot)banck(at)credativ(dot)de>, Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>, Kevin Grittner <kgrittn(at)gmail(dot)com>, Jordan Gigov <coladict(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Change in "policy" on dump ordering?
Date: 2017-07-26 02:45:35
Message-ID: 18658.1501037135@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> The main problem with Kevin's fix, after thinking about it more, is that
> it shoves matview refresh commands into the same final processing phase
> where ACLs are done, which means that in a parallel restore they will not
> be done in parallel. That seems like a pretty serious objection, although
> maybe not so serious that we'd be willing to accept a major rewrite in the
> back branches to avoid it.

> I'm wondering at this point about having restore create a fake DO_ACLS
> object (fake in the sense that it isn't in the dump file) that would
> participate normally in the dependency sort, and which we'd give a
> priority before matview refreshes but after everything else. "Restore"
> of that object would perform the same operation we do now of running
> through the whole TOC and emitting grants/revokes. So it couldn't be
> parallelized in itself (at least not without an additional batch of work)
> but it could be treated as an indivisible parallelized task, and then the
> matview refreshes could be parallelizable tasks after that.

> There's also Peter's proposal of splitting up GRANTs from REVOKEs and
> putting only the latter at the end. I'm not quite convinced that that's
> a good idea but it certainly merits consideration.

After studying things for awhile, I've concluded that that last option
is probably not workable. ACL items contain a blob of SQL that would be
tricky to pull apart, and is both version and options dependent, and
contains ordering dependencies that seem likely to defeat any desire
to put the REVOKEs last anyway.

Instead, I've prepared the attached draft patch, which addresses the
problem by teaching pg_backup_archiver.c to process TOC entries in
three separate passes, "main" then ACLs then matview refreshes.
It's survived light testing but could doubtless use further review.

Another way we could attack this is to adopt something similar to
the PRE_DATA_BOUNDARY/POST_DATA_BOUNDARY mechanism; that is, invent more
dummy section boundary objects, add dependencies sufficient to constrain
all TOC objects to be before or after the appropriate boundaries, and
then let the dependency sort go at it. But I think that way is probably
more expensive than this one, and it doesn't have any real advantage if
there's not a potential for circular dependencies that need to be broken.
If somebody else wants to try drafting a patch like that, I won't stand
in the way, but I don't wanna do so.

Not clear where we want to go from here. Should we try to get this
into next month's minor releases, or review it in September's commitfest
and back-patch after that?

regards, tom lane

Attachment Content-Type Size
restore-matviews-last-1.patch text/x-diff 29.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2017-07-26 02:50:05 Re: Mishandling of WCO constraints in direct foreign table modification
Previous Message Tom Lane 2017-07-26 02:18:05 Re: pg_dump does not handle indirectly-granted permissions properly