Re: WIP patch for parallel pg_dump

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Joachim Wieland <joe(at)mcknight(dot)de>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP patch for parallel pg_dump
Date: 2010-12-03 13:02:11
Message-ID: 4CF8EA53.5040101@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/02/2010 11:44 PM, Joachim Wieland wrote:
> On Thu, Dec 2, 2010 at 9:33 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> In particular, this issue *has* been discussed before, and there was a
>> consensus that preserving dump consistency was a requirement. I don't
>> think that Joachim gets to bypass that decision just by submitting a
>> patch that ignores it.
> I am not trying to bypass anything here :) Regarding the locking
> issue I probably haven't done sufficient research, at least I managed
> to miss the emails that mentioned it. Anyway, that seems to be solved
> now fortunately, I'm going to implement your idea over the weekend.
>
> Regarding snapshot cloning and dump consistency, I brought this up
> already several months ago and asked if the feature is considered
> useful even without snapshot cloning. And actually it was you who
> motivated me to work on it even without having snapshot consistency...
>
> http://archives.postgresql.org/pgsql-hackers/2010-03/msg01181.php
>
> In my patch pg_dump emits a warning when called with -j, if you feel
> better with an extra option
> --i-know-that-i-have-no-synchronized-snapshots, fine with me :-)
>
> In the end we provide a tool with limitations, it might not serve all
> use cases but there are use cases that would benefit a lot. I
> personally think this is better than to provide no tool at all...
>
>
>

I think Tom's statement there:

> I think migration to a new server version (that's too incompatible for
> PITR or pg_migrate migration) is really the only likely use case.

is just wrong. Say you have a site that's open 24/7. But there is a
window of, say, 6 hours, each day, when it's almost but not quite quiet.
You want to be able to make your disaster recovery dump within that
window, and the low level of traffic means you can afford the degraded
performance that might result from a parallel dump. Or say you have a
hot standby machine from which you want to make the dump but want to set
the max_standby_*_delay as low as possible. These are both cases where
you might want parallel dump and yet you want dump consistency. I have a
client currently considering the latter setup, and the timing tolerances
are a little tricky. The times in which the system is in a state that we
want dumped are fixed, and we want to be sure that the dump is finished
by the next time such a time rolls around. (This is a system that in
effect makes one giant state change at a time.) If we can't complete the
dump in that time then there will be a delay introduced to the system's
critical path. Parallel dump will be very useful in helping us avoid
such a situation, but only if it's properly consistent.

I think Josh Berkus' comments in the thread you mentioned are correct:

> Actually, I'd say that there's a broad set of cases of people who want
> to do a parallel pg_dump while their system is active. Parallel pg_dump
> on a stopped system will help some people (for migration, particularly)
> but parallel pg_dump with snapshot cloning will help a lot more people.

cheers

andrew

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeroen Vermeulen 2010-12-03 13:40:01 Re: Hypothetical Indexes - PostgreSQL extension - PGCON 2010
Previous Message Sergio Lifschitz 2010-12-03 12:44:36 Re: Hypothetical Indexes - PostgreSQL extension - PGCON 2010