Re: Few questions on postgresql (dblink, 2pc, clustering)

From: Christopher Browne <cbbrowne(at)acm(dot)org>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Few questions on postgresql (dblink, 2pc, clustering)
Date: 2004-08-22 13:44:49
Message-ID: m3vffbqoby.fsf@wolfe.cbbrowne.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

After a long battle with technology, mendola(at)bigfoot(dot)com (Gaetano Mendola), an earthling, wrote:
> | MySQL does not allow cross-server database connection such as dblink. So,
> | we're thinking of 3 alternatives:
> |
> | 1) Wait for MySQL clustering to be stable and put all our databases in the
> | cluster
> | 2) Migrate to PostgreSQL and use dblink to solve the referential integrity
> | 3) Migrate to PostgreSQL clustering solution
>
> May I know why are you sticky on the idea of spread your database
> among various servers ? Free your mysql-minded. If you idea is an
> horizontal scale solution then open your wallet and buy Oracle.
> Postgresql scale very well vertically.

Indeed.

It seems quite unclear what the proposed merit of "clustering" is
_supposed_ to be, and I think it bears asking Jim Worke to explain in
more detail what they're trying to accomplish by it.

- If the goal is improved redundancy, then I'd argue that doing all
the updates on the "master" and having several "hot replicas" being
maintained using Slony-I would accomplish that without technology
changes.

- If the goal is to somehow get better performance by partitioning
work across multiple servers, then this can be attained by having
those several "hot replicas," and directing as many read-only
operations to replicas as possible. Updates have to go to the
"master;" by dropping out query load, that lets the "master" be
occupied primarily with write operations.

- System reliability does NOT improve if write operations are
spread across several servers. We have been setting up extra
replicas of some databases on some new servers lately, and people
that _haven't_ thought it out have briefly imagined it a good idea
to spread the 'masters' across more servers, which would _hurt_
reliability, in fact.

The situation we have is that our app needs access to two
databases: one that stores "state," and another that logs activity.

The theory that people come up with is that when we have 3 main "big
servers," we should write the "state" to one, and logs to another.

Reality rears its ugly head: Doing that makes the system more
vulnerable, as if _either_ of those servers goes down, that will
cause the application to go into convulsions. Putting "state" and
"logs" on the same server, and replicating everywhere else is, in
fact, the more reliable choice. If the ONE server that's "master"
goes down, the application won't be happy, but that was always going
to be the case.

The point: Distributing writes across many hosts makes the system
vulnerable to the possibility of _any_ of them going down.

Furthermore, it is not evident that distributing writes will be able
to actually improve performance, because it introduces substantial
additional communications overhead.

At some point, there will be parts of the write activities that have
to be handled in a serial manner, in one place. For instance,
updating and reporting account balances must ultimately be thus
handled. If there are 10 customers fighting over who gets to order
your last 20 pallets of "Grapple Grommets," the handling of who
actually gets that inventory has _got_ to be serialized in one place
if you don't want to run the risk of the over-commitment of perhaps
promising 200 pallets worth when you only have 20.

2PC provides a way of managing that serialization across multiple
databases; it has two inherent problems relating to what is already
said above:

1. The communications overhead involved in coordinating requests
across multiple databases means that it's unlikely to be a
"performance win";

2. There are deadlock situations that 2PC is vulnerable to that
make it _less_ reliable than having just one database, when
there is the possibility of hardware or communications
failure.

Note: These aren't PostgreSQL-specific issues; they apply to the
various classes of "clustering solutions." It may be that Oracle or
DB2 or Informix have some particular features that partially relieve
some of the performance problems with "partitioning," but that still
leaves a big bill to pay, and you'd better be sure you're getting
some actual value for the extra coin...
--
let name="cbbrowne" and tld="cbbrowne.com" in String.concat "@" [name;tld];;
http://www3.sympatico.ca/cbbrowne/spiritual.html
Signs of a Klingon Programmer #8: "What is this talk of 'release'?
Klingons do not make software 'releases.' Our software 'escapes'
leaving a bloody trail of designers and quality assurance people in
its wake."

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Thomas Hallgren 2004-08-22 14:14:44 Re: Unsupported 3rd-party solutions (Was: Few questions on postgresql
Previous Message Sim Zacks 2004-08-22 13:01:46 Re: Unsupported 3rd-party solutions (Was: Few questions on postgresql (dblink, 2pc, clustering))