Re: pg_dump and pgpool

From: Scott Marlowe <smarlowe(at)g2switchworks(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: pg_dump and pgpool
Date: 2004-12-29 23:07:20
Message-ID: 1104361640.5893.34.camel@state.g2switchworks.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, 2004-12-29 at 16:56, Tom Lane wrote:
> Scott Marlowe <smarlowe(at)g2switchworks(dot)com> writes:
> > On Wed, 2004-12-29 at 16:33, Tom Lane wrote:
> >> I'd worry about
> >> synchronization issues to start with...
>
> > I am not worried about that. As long as I'm not doing things like
> > inserting random() into the database, the data in the two backend stores
> > is coherent.
>
> For sufficiently small values of "coherent", sure, but I am not prepared
> to buy into the notion that pg_dump cannot examine the database contents
> more closely than the stupidest user application will ;-).

Sounds a bit like verbal handwaving here.

> Also, let's play devil's advocate and assume that the master and slave
> *have* managed to get out of sync somehow. Do you want your backup to
> be a true picture of the master's state, or an unpredictable
> amalgamation of the master and slave states? Heaven help you if you
> need to use the backup to recover from the out-of-sync condition.

Actually, given the operational mode pgpool is in it would error out
here, since it is basically issuing all queries, select or otherwise, to
both backends, and comparing what comes back. If it's not the same, the
connection is dropped and the transaction rolled back.

Hence the current problem where the dump fails. I.e. we're getting two
different OIDs and pgpool is barfing on that.

>
> >> I don't think we should make pg_dump slower and possibly less reliable
> >> in order to support a fundamentally dangerous administration procedure.
> >> Run pg_dump directly into the database, not through pgpool.
>
> > What makes you think this would be slower. If anything, it would be
> > faster or as fast, since we're throwing fewer queries and at the same
> > time, hiding the implementation details that OIDs are.
>
> No, we'd be throwing more, and more complex, queries. Instead of a
> simple lookup there would be some kind of join, or at least a lookup
> that uses a multicolumn key.

I'm willing to bet the performance difference is less than noise.

> There are also correctness issues to think about. OID *is* the primary
> key on most of the system catalogs pg_dump is looking at, and not all of
> them have other unique keys. Doing anything useful with pg_depend or
> pg_description without explicitly looking at OIDs would be darn painful,
> too.

Don't we always preach to users to NEVER use OIDs as PKs in their apps?
:-) Seriously though, I get your point. What I want to know is if it's
possible to do this without passing OIDs back and forth. Keep in mind,
I don't mind there being OIDs, I'd just like to have all the references
to them be hidden from the backup agent by joins et. al.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Eric Brown 2004-12-29 23:09:05 Re: debug_print_plan (pg7.4) doesn't seem to do anything
Previous Message Tom Lane 2004-12-29 22:56:05 Re: pg_dump and pgpool