From: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> |
---|---|
To: | Joachim Wieland <joe(at)mcknight(dot)de> |
Cc: | Greg Smith <greg(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, José Arthur Benetasso Villanova <jose(dot)arthur(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: directory archive format for pg_dump |
Date: | 2010-12-16 18:45:42 |
Message-ID: | 4D0A5E56.1080109@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 16.12.2010 20:33, Joachim Wieland wrote:
> On Thu, Dec 16, 2010 at 12:48 PM, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> As soon as we have parallel pg_dump, the next big thing is going to be
>> parallel dump of the same table using multiple processes. Perhaps we should
>> prepare for that in the directory archive format, by allowing the data of a
>> single table to be split into multiple files. That way parallel pg_dump is
>> simple, you just split the table in chunks of roughly the same size, say
>> 10GB each, and launch a process for each chunk, writing to a separate file.
>
> How exactly would you "just split the table in chunks of roughly the
> same size" ?
Check pg_class.relpages, and divide that evenly across the processes.
That should be good enough.
> Which queries should pg_dump send to the backend? If it
> just sends a bunch of WHERE queries, the server would still scan the
> same data several times since each pg_dump client would result in a
> seqscan over the full table.
Hmm, I was thinking of "SELECT * FROM table WHERE ctid BETWEEN ? AND ?",
but we don't support TidScans for ranges. Perhaps we could add that.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Pavel Stehule | 2010-12-16 19:19:17 | proposal: FOREACH-IN-ARRAY (probably for 9.2?) |
Previous Message | Joachim Wieland | 2010-12-16 18:33:10 | Re: directory archive format for pg_dump |