Re: Warn when parallel restoring a custom dump without data offsets

From: David Gilman <davidgilman1(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Justin Pryzby <pryzby(at)telsasoft(dot)com>
Subject: Re: Warn when parallel restoring a custom dump without data offsets
Date: 2020-05-21 03:05:01
Message-ID: CALBH9DBsWbwAXseNJdk4DbH1aR=sdczyAirp1aSxjpBSi86FzA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I did some more digging. To keep everyone on the same page there are
four different ways to order TOCs:

1. topological order,
2. dataLength order, size of the table, is always zero when pg_dump can't seek,
3. dumpId order, which should be thought as random but roughly
correlates to topological order to make things fun,
4. file order, the order that tables are physically stored in the
custom dump file.

Without being able to seek backwards a parallel restore of the custom
dump archive format has to be ordered by #1 and #4. The reference
counting that reduce_dependencies does inside of the parallel restore
logic upholds ordering #1. Unfortunately, 548e50976ce changed
TocEntrySizeCompare (which is used to break ties within #1) to order
by #2, then by #3. This most often breaks on dumps written by pg_dump
without seeks (everything has a dataLength of zero) as it then falls
back to #3 ordering every time. But, because nothing in pg_restore
does any ordering by #4 you could potentially run into this with any
custom dump so I think it's a regression.

For some troubleshooting I changed ready_list_sort to never call
qsort. This fixes the problem by never ordering by #3, leaving things
in #4 order, but breaks the new algorithm introduced in 548e50976ce.

I did what Justin suggested earlier and it works great. Parallel
restore requires seekable input (enforced elsewhere) so everyone's
parallel restores should work again.

On Wed, May 20, 2020 at 10:48 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> David Gilman <davidgilman1(at)gmail(dot)com> writes:
> >> I think the PG11
> >> commit you mentioned (548e5097) happens to make some databases fail in
> >> parallel restore that previously worked (I didn't check).
>
> > Correct, if you do the bisect around that yourself you'll see
> > pg_restore start failing with the expected "possibly due to
> > out-of-order restore request" on offset-less dumps.
>
> Yeah. Now, the whole point of that patch was to decouple the restore
> order from the dump order ... but with an offset-less dump file, we
> can't do that, or at least the restore order is greatly constrained.
> I wonder if it'd be sensible for pg_restore to use a different parallel
> scheduling algorithm if it notices that the input lacks offsets.
> (There could still be some benefit from parallelism, just not as much.)
> No idea if this is going to be worth the trouble, but it probably
> is worth looking into.
>
> regards, tom lane

--
David Gilman
:DG<

Attachment Content-Type Size
0001-pg_restore-fix-v2.patch application/octet-stream 2.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2020-05-21 03:18:16 Re: Is it useful to record whether plans are generic or custom?
Previous Message Thomas Munro 2020-05-21 02:31:37 Re: Parallel Seq Scan vs kernel read ahead