Quick Links

Re: hardware performance and some more

From:	William Yu <wyu(at)talisys(dot)com>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: hardware performance and some more
Date:	2003-07-24 16:42:56
Message-ID:	3F200C90.5050001@talisys.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers pgsql-performance

| first of all I would like to learn that, any of you use the postgresql
| within the clustered environment? Or, let me ask you the question, in
| different manner, can we use postgresql in a cluster environment? If
| we can do what is the support method of the postgresql for clusters?

You could do active-active but it would require work on your end. I did
a recent check on all the Postgres replication packages and they all
seem to be single master -> single/many slaves. Updating on more than 1
server looks to be problematic. I run an active-active now but I had to
develop my own custom replication strategy.

As a background, we develop & host web-based apps that use Postgres as
the DB engine. Since our clients access our server over the internet,
uptime is a big issue. Hence, we have two server farms: one colocated in
San Francisco and the other in Sterling, VA. In addition to redudancy,
we also wanted to spread the load across the servers. To do this, we
went with the expedient method of 1-minute DNS zonemaps where if both
servers are up, 70% traffic is sent to the faster farm and 30% to the
other. Both servers are constantly monitored and if one goes down, a new
zonemap is pushed out listing only the servers that are up.

The first step in making this work was converting all integer keys to
character keys. By making keys into characters, we could prepend a
server location code so ID 100 generated at SF would not conflict with
ID 100 generated in Sterling. Instead, they would be marked as S00000100
and V00000100. Another benefit is the increase of possible key
combinations by being able to use alpha characters. (36^(n-1) versus 10^n)

At this time, the method we use is a periodic sweep of all updated
records. In every table, we add extra fields to mark the date/time the
record was last inserted/updated/deleted. All records touched as of the
last resync are extracted, zipped up, pgp-encrypted and then posted on
an ftp server. Files are then transfered between servers, records
unpacked and inserted/updated. Some checks are needed to determine what
takes precedence if users updated the same record on both servers but
otherwise it's a straightforward process.

As far as I can tell, the performance impact seems to be minimal.
There's a periodic storm of replication updates in cases where there's
mass updates sync last resync. But if you have mostly reads and few
writes, you shouldn't see this situation. The biggest performance impact
seems to be the CPU power needed to zip/unzip/encrypt/decrypt files.

I'm thinking over strats to get more "real-time" replication working. I
suppose I could just make the resync program run more often but that's a
bit inelegant. Perhaps I could capture every update/delete/insert/alter
statement from the postgres logs, parsing them out to commands and then
zipping/encrypting every command as a separate item to be processed. Or
add triggers to every table where updated records are pushed to a custom
"updated log".

The biggest problem is of course locks -- especially at the application
level. I'm still thinking over what to do here.

Responses

Re: hardware performance and some more at 2003-07-25 14:03:26 from Shridhar Daithankar

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Rod Taylor	2003-07-24 16:46:42	Re: libpq_r
Previous Message	Lee Kindness	2003-07-24 16:08:31	Re: libpq_r

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Kasim Oztoprak	2003-07-24 18:25:38	Re: hardware performance and some more
Previous Message	Kasim Oztoprak	2003-07-24 15:54:52	hardware performance and some more