Re: pg_dump and pgpool

From: Scott Marlowe <smarlowe(at)g2switchworks(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: pg_dump and pgpool
Date: 2004-12-30 14:16:15
Message-ID: 1104416174.5893.52.camel@state.g2switchworks.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, 2004-12-29 at 17:30, Tom Lane wrote:
> Scott Marlowe <smarlowe(at)g2switchworks(dot)com> writes:
> > On Wed, 2004-12-29 at 16:56, Tom Lane wrote:
> >> No, we'd be throwing more, and more complex, queries. Instead of a
> >> simple lookup there would be some kind of join, or at least a lookup
> >> that uses a multicolumn key.
>
> > I'm willing to bet the performance difference is less than noise.
>
> [ shrug... ] I don't have a good handle on that, and neither do you.
> What I am quite sure about though is that pg_dump would become internally
> a great deal messier and harder to maintain if it couldn't use OIDs.
> Look at the DumpableObject manipulations and ask yourself what you're
> going to do instead if you have to use a primary key that is of a
> different kind (different numbers of columns and datatypes) for each
> system catalog. Ugh.

Wait, do you mean it's impossible to throw a single SQL query with a
proper join clause that USES OIDs but doesn't return them? Or that it's
impossible to throw a single query without joining on OIDs. I don't
mind joining on OIDs, I just don't want them crossing the connection is
all. And yes, it might be ugly, but I can't imagine it being
unmaintable for some reason.

> I don't think it's worth that price to support a fundamentally bogus
> approach to backup.

But it's not bogus. IT allows me to compare two databases running under
a pgpool synchronous cluster and KNOW if there are inconsistencies in
data between them, so it is quite useful to me.

> IMHO you don't want extra layers of software in
> between pg_dump and the database --- each one just introduces another
> risk of getting a wrong backup. You've yet to explain what the
> *benefit* of putting pgpool in there is for this problem.

Actually, it ensures that I get the right backup, because pgpool will
cause the backup to fail if there are any differences between the two
backend servers, thus telling me that I have an inconsistency.

That's the primary reason I want this. The secondary reason, which I
can work around, is that I'm running the individual databases on
machines that only answer the specific IP of the pgpool machine's IP, so
remote backups aren't possible, and only the pgpool machine would be
capable of doing the backups, but we have (like so many other companies)
a centralized backup server. I can always allow that machine to connect
to the database(s) to do backup, but my fear is that by allowing
anything other than pgpool to hit those backend databases they could be
placed out of sync with each other. Admitted, a backup process
shouldn't be updating the database, so this, as I said, isn't really a
big deal. More of a mild kink really. As long as all access is
happening through pgpool, they should stay coherent to each other.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tatsuo Ishii 2004-12-30 14:50:31 Re: pg_dump and pgpool
Previous Message Scott Marlowe 2004-12-30 14:07:33 Re: pg_dump and pgpool