Re: pg_dump test instability

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_dump test instability
Date: 2018-08-28 18:47:17
Message-ID: 1765.1535482037@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Stephen Frost <sfrost(at)snowman(dot)net> writes:
> Parallel *restore* can be done from either a custom-format dump or from
> a directory-format dump. I agree that we should seperate the concerns
> and perform independent sorting on the restore side of things based on
> the relative sizes of tables in the dump (be it custom format or
> directory format). While compression might make us not exactly correct
> on the restore side, I expect that we'll generally be close enough to
> avoid most cases where a single worker gets stuck working on a large
> table at the end after all the other work is done.

Here's a proposed patch for this. It removes the hacking of the TOC list
order, solving Peter's original problem, and instead sorts-by-size
in the actual parallel dump or restore control code. There are a
number of ensuing performance benefits:

* The BLOBS entry, if any, gets to participate in the ordering decision
during parallel dumps. As the code stands, all the effort to avoid
scheduling a long job last is utterly wasted if you've got a lot of
blobs, because that entry stayed at the end. I didn't work real hard
on that, just gave it a large size so it would go first not last. If
you just have a few blobs, that's not necessary, but I doubt it hurts
either.

* During restore, we insert actual size numbers into the BLOBS and
TABLE DATA items, and then anything that depends on a TABLE DATA item
inherits its size. This results in size-based prioritization not just
for simple indexes as before, but also for constraint indexes (UNIQUE
or PRIMARY KEY), foreign key verifications, delayed CHECK constraints,
etc. It also means that stuff like triggers and rules get reinstalled
in size-based order, which doesn't really help, but again I don't
think it hurts.

* Parallel restore scheduling by size works for custom dumps as well
as directory ones (as long as the dump file was seekable when created,
but you'll be hurting anyway if it wasn't).

I have not really tried to demonstrate performance benefits, because
the results would depend a whole lot on your test case; but at least
in principle this should result in far more intelligent scheduling
of parallel restores.

While I haven't done so here, I'm rather tempted to rename the
par_prev/par_next fields and par_list_xxx functions to pending_prev,
pending_next, pending_list_xxx, since they now have only one use.
(BTW, I tried really hard to get rid of par_prev/par_next altogether,
in favor of keeping the pending entries in the unused space in the
"ready" TocEntry* array. But it didn't work out well --- seems like
a list really is the natural data structure for that.)

regards, tom lane

Attachment Content-Type Size
smarter-parallel-dump-restore-1.patch text/x-diff 43.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2018-08-28 18:54:02 Re: csv format for psql
Previous Message Jeremy Finzel 2018-08-28 17:41:54 Re: Some pgq table rewrite incompatibility with logical decoding?