Skip site navigation (1) Skip section navigation (2)

Re: hardware performance and some more

From: William Yu <wyu(at)talisys(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: hardware performance and some more
Date: 2003-07-24 16:42:56
Message-ID: 3F200C90.5050001@talisys.com (view raw or flat)
Thread:
Lists: pgsql-hackerspgsql-performance
| first of all I would like to learn that, any of you use the postgresql
| within the clustered environment? Or, let me ask you the question, in
| different manner, can we use postgresql in a cluster environment? If
| we can do what is the support method of the postgresql for clusters?

You could do active-active but it would require work on your end. I did 
a recent check on all the Postgres replication packages and they all 
seem to be single master -> single/many slaves. Updating on more than 1 
server looks to be problematic. I run an active-active now but I had to 
develop my own custom replication strategy.

As a background, we develop & host web-based apps that use Postgres as 
the DB engine. Since our clients access our server over the internet, 
uptime is a big issue. Hence, we have two server farms: one colocated in 
San Francisco and the other in Sterling, VA. In addition to redudancy, 
we also wanted to spread the load across the servers. To do this, we 
went with the expedient method of 1-minute DNS zonemaps where if both 
servers are up, 70% traffic is sent to the faster farm and 30% to the 
other. Both servers are constantly monitored and if one goes down, a new 
zonemap is pushed out listing only the servers that are up.

The first step in making this work was converting all integer keys to 
character keys. By making keys into characters, we could prepend a 
server location code so ID 100 generated at SF would not conflict with 
ID 100 generated in Sterling. Instead, they would be marked as S00000100 
and V00000100. Another benefit is the increase of possible key 
combinations by being able to use alpha characters. (36^(n-1) versus 10^n)

At this time, the method we use is a periodic sweep of all updated 
records. In every table, we add extra fields to mark the date/time the 
record was last inserted/updated/deleted. All records touched as of the 
last resync are extracted, zipped up, pgp-encrypted and then posted on 
an ftp server. Files are then transfered between servers, records 
unpacked and inserted/updated. Some checks are needed to determine what 
takes precedence if users updated the same record on both servers but 
otherwise it's a straightforward process.

As far as I can tell, the performance impact seems to be minimal. 
There's a periodic storm of replication updates in cases where there's 
mass updates sync last resync. But if you have mostly reads and few 
writes, you shouldn't see this situation. The biggest performance impact 
seems to be the CPU power needed to zip/unzip/encrypt/decrypt files.

I'm thinking over strats to get more "real-time" replication working. I 
suppose I could just make the resync program run more often but that's a 
bit inelegant. Perhaps I could capture every update/delete/insert/alter 
statement from the postgres logs, parsing them out to commands and then 
zipping/encrypting every command as a separate item to be processed. Or 
add triggers to every table where updated records are pushed to a custom 
"updated log".

The biggest problem is of course locks -- especially at the application 
level. I'm still thinking over what to do here.


Responses

pgsql-performance by date

Next:From: Kasim OztoprakDate: 2003-07-24 18:25:38
Subject: Re: hardware performance and some more
Previous:From: Kasim OztoprakDate: 2003-07-24 15:54:52
Subject: hardware performance and some more

pgsql-hackers by date

Next:From: Rod TaylorDate: 2003-07-24 16:46:42
Subject: Re: libpq_r
Previous:From: Lee KindnessDate: 2003-07-24 16:08:31
Subject: Re: libpq_r

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group