Re: pg_dump and pgpool

From: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
To: smarlowe(at)g2switchworks(dot)com
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: pg_dump and pgpool
Date: 2004-12-30 14:50:31
Message-ID: 20041230.235031.45868288.t-ishii@sra.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

> I've noticed today that if one tries to pg_dump a database cluster
> running under pgpool, one gets the error message:
>
> pg_dump: query to get table columns failed: ERROR: kind mismatch
> between backends
> HINT: check data consistency between master and secondary
>
> Looking at the SQL that pg_dump sends to be relying on object OIDs for
> the dump. Would it be reasonable at some date to design pg_dump to work
> by joining whatever query would get the OID with the query that would
> use it as where criteria so that the OID itself is never reported?
> Would there be any interest in such changes, were they made, making
> their way into the backend?
>
> Does my question even make sense? It's been a long couple of weeks...

OIDs may not be replicated exactly same by using pgpool. It has been
explained in pgpool web page.

"Please note that certain queries are physically dependent to a server
or non-deterministic. These include random functions, OID, XID, and
timestamps may not be replicated in exactly same value among two
servers."

This could happen due to timing difference when multiple sessions run
through pgpool. Suppose session A issues "INSERT INTO foo VALUES(1, 100)"
while session B issues "INSERT INTO foo VALUES(2, 101)" simultaneously and
OID counter value is 100000 on master and secondary. Following
situation could happen:

master executes INSERT INTO foo VALUES(1, 100) and assigns OID 100001
master executes INSERT INTO foo VALUES(2, 101) and assigns OID 100002
secondary executes INSERT INTO foo VALUES(2, 101) and assigns OID 100001
secondary executes INSERT INTO foo VALUES(1, 100) and assigns OID 100002

As a result, on the master row(1, 100)'s OID is 100001 and row(2,
101)&s OID is 100002, while on the secondary row(1, 100)'s OID is
100002 and row(2, 101)&s OID is 100001.

(Note, however, data consitency within the master or the secondary has
not been broken. So you can get consistent dump by directly connecting
to master or secondary.)

One way to prevent the problem above is having a lock on the OID
counter. Of course there's no such that lock in PostgreSQL, we need a
substitution, for example, locking pg_database (I know this is a
horrible idea).

BTW I think there's no replication tool in the world which can
replicate OIDs. So applications including pg_dump, psql or whatever
could not issue system catalog related queries in load-balacing manner
to replicated servers to gain better performance.
--
Tatsuo Ishii

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2004-12-30 15:20:28 Re: pg_dump and pgpool
Previous Message Scott Marlowe 2004-12-30 14:16:15 Re: pg_dump and pgpool