Skip site navigation (1) Skip section navigation (2)

Re: Should pg_dump dump larger tables first?

From: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "David Rowley" <dgrowleyml(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Should pg_dump dump larger tables first?
Date: 2013-01-31 10:06:24
Message-ID: m238xhd5un.fsf@2ndQuadrant.fr (view raw or flat)
Thread:
Lists: pgsql-hackers
Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
> Also, it's far from obvious to me that "largest first" is the best rule
> anyhow; it's likely to be more complicated than that.
>
> But anyway, the right place to add this sort of consideration is in
> pg_restore --parallel, not pg_dump.  I don't know how hard it would be
> for the scheduler algorithm in there to take table size into account,
> but at least in principle it should be possible to find out the size of
> the (compressed) table data from examination of the archive file.

From some experiences with pgloader and loading data in migration
processes, often enough the most gains are to be had when you load the
biggest table in parallel with loading all the little ones. It often
makes it so that the big table loading time is not affected, and by the
time it's done the rest of the database is done too.

Loading several big'o'tables in parallel tend not to give benefits in
the tests I've done so far, but that might be an artefact of python
multi threading, I will do some testing with proper tooling later.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support


In response to

Responses

pgsql-hackers by date

Next:From: John R PierceDate: 2013-01-31 10:15:01
Subject: Re: Should pg_dump dump larger tables first?
Previous:From: YAMAMOTO TakashiDate: 2013-01-31 07:39:25
Subject: Re: SYSV shared memory vs mmap performance

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group