Re: Partitioning / Clustering

From: John A Meinel <john(at)arbash-meinel(dot)com>
To: Alex Stapleton <alexs(at)advfn(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Partitioning / Clustering
Date: 2005-05-10 14:41:05
Message-ID: 4280C801.80603@arbash-meinel.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Alex Stapleton wrote:
> What is the status of Postgres support for any sort of multi-machine
> scaling support? What are you meant to do once you've upgraded your box
> and tuned the conf files as much as you can? But your query load is
> just too high for a single machine?
>
> Upgrading stock Dell boxes (I know we could be using better machines,
> but I am trying to tackle the real issue) is not a hugely price
> efficient way of getting extra performance, nor particularly scalable
> in the long term.

Switch from Dell Xeon boxes, and go to Opterons. :) Seriously, Dell is
far away from Big Iron. I don't know what performance you are looking
for, but you can easily get into inserting 10M rows/day with quality
hardware.

But actually is it your SELECT load that is too high, or your INSERT
load, or something inbetween.

Because Slony is around if it is a SELECT problem.
http://gborg.postgresql.org/project/slony1/projdisplay.php

Basically, Slony is a Master/Slave replication system. So if you have
INSERT going into the Master, you can have as many replicated slaves,
which can handle your SELECT load.
Slony is an asynchronous replicator, so there is a time delay from the
INSERT until it will show up on a slave, but that time could be pretty
small.

This would require some application level support, since an INSERT goes
to a different place than a SELECT. But there has been some discussion
about pg_pool being able to spread the query load, and having it be
aware of the difference between a SELECT and an INSERT and have it route
the query to the correct host. The biggest problem being that functions
could cause a SELECT func() to actually insert a row, which pg_pool
wouldn't know about. There are 2 possible solutions, a) don't do that
when you are using this system, b) add some sort of comment hint so that
pg_pool can understand that the select is actually an INSERT, and needs
to be done on the master.

>
> So, when/is PG meant to be getting a decent partitioning system? MySQL
> is getting one (eventually) which is apparently meant to be similiar to
> Oracle's according to the docs. Clusgres does not appear to be
> widely/or at all used, and info on it seems pretty thin on the ground,
> so I am
> not too keen on going with that. Is the real solution to multi- machine
> partitioning (as in, not like MySQLs MERGE tables) on PostgreSQL
> actually doing it in our application API? This seems like a less than
> perfect solution once we want to add redundancy and things into the mix.

There is also PGCluster
http://pgfoundry.org/projects/pgcluster/

Which is trying to be more of a Synchronous multi-master system. I
haven't heard of Clusgres, so I'm guessing it is an older attempt, which
has been overtaken by pgcluster.

Just realize that clusters don't necessarily scale like you would want
them too. Because at some point you have to insert into the same table,
which means you need to hold a lock which prevents the other machine
from doing anything. And with synchronous replication, you have to wait
for all of the machines to get a copy of the data before you can say it
has been committed, which does *not* scale well with the number of machines.

If you can make it work, I think having a powerful master server, who
can finish an INSERT quickly, and then having a bunch of Slony slaves
with a middleman (like pg_pool) to do load balancing among them, is the
best way to scale up. There are still some requirements, like not having
to see the results of an INSERT instantly (though if you are using
hinting to pg_pool, you could hint that this query must be done on the
master, realizing that the more you do it, the more you slow everything
down).

John
=:->

PS> I don't know what functionality has been actually implemented in
pg_pool, just that it was discussed in the past. Slony-II is also in the
works.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2005-05-10 14:45:42 Re: full outer performance problem
Previous Message John A Meinel 2005-05-10 14:28:54 Re: full outer performance problem