Re: [PERFORM] Slow BLOBs restoring

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [PERFORM] Slow BLOBs restoring
Date: 2010-12-09 15:05:15
Message-ID: 6530.1291907115@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

I wrote:
> One fairly simple, if ugly, thing we could do about this is skip calling
> reduce_dependencies during the first loop if the TOC object is a blob;
> effectively assuming that nothing could depend on a blob. But that does
> nothing about the point that we're failing to parallelize blob
> restoration. Right offhand it seems hard to do much about that without
> some changes to the archive representation of blobs. Some things that
> might be worth looking at for 9.1:

> * Add a flag to TOC objects saying "this object has no dependencies",
> to provide a generalized and principled way to skip the
> reduce_dependencies loop. This is only a good idea if pg_dump knows
> that or can cheaply determine it at dump time, but I think it can.

I had further ideas about this part of the problem. First, there's no
need for a file format change to fix this: parallel restore is already
groveling over all the dependencies in its fix_dependencies step, so it
could count them for itself easily enough. Second, the real problem
here is that reduce_dependencies processing is O(N^2) in the number of
TOC objects. Skipping it for blobs, or even for all dependency-free
objects, doesn't make that very much better: the kind of people who
really need parallel restore are still likely to bump into unreasonable
processing time. I think what we need to do is make fix_dependencies
build a reverse lookup list of all the objects dependent on each TOC
object, so that the searching behavior in reduce_dependencies can be
eliminated outright. That will take O(N) time and O(N) extra space,
which is a good tradeoff because you won't care if N is small, while if
N is large you have got to have it anyway.

Barring objections, I will do this and back-patch into 9.0. There is
maybe some case for trying to fix 8.4 as well, but since 8.4 didn't
make a separate TOC entry for each blob, it isn't as exposed to the
problem. We didn't back-patch the last round of efficiency hacks in
this area, so I'm thinking it's not necessary here either. Comments?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2010-12-09 15:50:03 Re: Solving sudoku using SQL
Previous Message Tom Lane 2010-12-09 14:50:30 Re: [PERFORM] Slow BLOBs restoring

Browse pgsql-performance by date

  From Date Subject
Next Message Robert Haas 2010-12-09 15:56:39 Re: [PERFORM] Slow BLOBs restoring
Previous Message Tom Lane 2010-12-09 14:50:30 Re: [PERFORM] Slow BLOBs restoring