Re: Should pg_dump dump larger tables first?

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Should pg_dump dump larger tables first?
Date: 2013-01-31 16:32:48
Message-ID: CAMkU=1xEMEeSqvLGUMcBjSX5Ag5KNqi0OBoC37+rf6+v33UuJA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 29, 2013 at 3:34 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> "David Rowley" <dgrowleyml(at)gmail(dot)com> writes:
>> If pg_dump was to still follow the dependencies of objects, would there be
>> any reason why it shouldn't backup larger tables first?
>
> Pretty much every single discussion/complaint about pg_dump's ordering
> choices has been about making its behavior more deterministic not less
> so. So I can't imagine such a change would go over well with most folks.
>
> Also, it's far from obvious to me that "largest first" is the best rule
> anyhow; it's likely to be more complicated than that.

From my experience in the non-database world of processing many files
of greatly different sizes in parallel, sorting them so the largest
are scheduled first and smaller ones get "pack" around them is very
successful and very easy.

I agree that best rule surely is more complicated, but probably so
much so that it will never get implemented.

>
> But anyway, the right place to add this sort of consideration is in
> pg_restore --parallel, not pg_dump.

Yeah.

Cheers,

Jeff

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Christopher Browne 2013-01-31 16:50:26 Re: Should pg_dump dump larger tables first?
Previous Message Tom Lane 2013-01-31 15:39:38 Re: Strange Windows problem, lock_timeout test request