Skip site navigation (1) Skip section navigation (2)

Re: [HACKERS] Replication documentation addition

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Chris Browne <cbbrowne(at)acm(dot)org>
Cc: pgsql-docs(at)postgresql(dot)org
Subject: Re: [HACKERS] Replication documentation addition
Date: 2006-11-14 21:42:23
Message-ID: 200611142142.kAELgNO25509@momjian.us (view raw or flat)
Thread:
Lists: pgsql-docspgsql-hackers
Chris Browne wrote:
> bruce(at)momjian(dot)us (Bruce Momjian) writes:
> > With no new additions submitted today, I have moved my text into our
> > SGML documentation:
> >
> > 	http://momjian.us/main/writings/pgsql/sgml/failover.html
> >
> > Please let me know what additional changes are needed.
> 
> It's looking a lot improved to me...
> 
> There are still numerous places where it needs s/Slony/Slony-I/g
> because there is more than one thing out there called "Slony," only
> one of which is the single-master-to-multiple-subscribers-asynchronous
> replication system...

Fixed.

> <http://momjian.us/main/writings/pgsql/sgml/query-broadcast-load-balancing.html>
> 
> "This can be complex to set up because functions like random() and
> CURRENT_TIMESTAMP will have different values on different servers, and
> sequences should be consistent across servers."
> 
> It doesn't make sense to call this "complex to set up."  This problem
> isn't about complexity of setup; it is about whether updates are
> processed identically on different hosts.  
> 
> Perhaps better:
> 
> "Query broadcasting can break down such that servers fall out of sync
> if the queries have nondeterministic behavior.  For instance,
> functions like random(), CURRENT_TIMESTAMP, and
> nextval('some_sequence') will take on different values on different
> servers.  Care must be taken at the application level to make sure
> that queries are all fully deterministic and that they either COMMIT
> or ABORT on all servers."

I redid the section with:

   Because each server operates independently, functions like
   <function>random()</>, <function>CURRENT_TIMESTAMP</>, and
   sequences can have different values on different servers.  If
   this is unacceptable, applications must query such values from
   a single server and then use those values in write queries.
   Also, care must also be taken that all transactions either commit
   or abort on all servers  Pgpool is an example of this type of
   replication.

> <http://momjian.us/main/writings/pgsql/sgml/clustering-for-load-balancing.html>
> "24.6. Clustering For Load Balancing
> 
> In clustering, each server can accept write requests, and these write
> requests are broadcast from the original server to all other servers
> before each transaction commits. Under heavy load, this can cause
> excessive locking and performance degradation. It is implemented by
> Oracle in their RAC product. PostgreSQL does not offer this type of
> load balancing, though PostgreSQL two-phase commit can be used to
> implement this in application code or middleware."
> 
> Something doesn't feel entirely right here...
> 
> How about...
> 
> "24.6. Multimaster Replication For Load Balancing
> 
> In this scenario, each server can accept write requests, which are
> broadcast from the original server to all other servers before each
> transaction commits in order to ensure consistency.  Unfortunately,
> under heavy load, the cost of distributing locks across servers can
> lead to substantial performance degradation. It is implemented by
> Oracle in their RAC product. PostgreSQL does not offer this type of
> load balancing, though PostgreSQL two-phase commit using <xref
> linkend="sql-prepare-transaction-title"> and <xref linkend=
> "sql-commit-prepared-title"> may be used to implement this in
> application code or middleware.
> 
> The communications costs involved in distributing locks and writes
> have the result that write operations are considerably more expensive
> than they would be on a single server.  In general, the cost of
> distributed locking means that this clustering approach is only usable
> across a cluster of servers at a local site.  
> 
> There will only be a performance "win" if the cluster mostly processes
> read-only traffic that the cluster can distribute across a larger
> number of database servers.  Write performance generally degrades a
> fair bit as compared to using a single database server.  Reliability
> should be enhanced since the cluster should be able to continue work
> even if some of the members of the cluster should fail."

Your description was too detailed, but I took some of your concepts:

  <para>
   In clustering, each server can accept write requests, and these
   write requests are broadcast from the original server to all
   other servers before each transaction commits.  Heavy write
   activity can cause excessive locking, leading to poor performance.
   In fact, write performance is often worse than that of a single
   server.  Read requests can be sent to any server.  Clustering
   is best for mostly read workloads, though its big advantage is
   that any server can accept write requests --- there is no need
   to partition workloads between read/write and read-only servers.
  </para>

  <para>
   Clustering is implemented by <productname>Oracle</> in their
   <productname><acronym>RAC</></> product.  <productname>PostgreSQL</>
   does not offer this type of load balancing, though
   <productname>PostgreSQL</> two-phase commit (<xref
   linkend="sql-prepare-transaction-title"> and <xref linkend=
   "sql-commit-prepared-title">) can be used to implement this in
   application code or middleware.
  </para>

> 
> <http://momjian.us/main/writings/pgsql/sgml/clustering-for-parallel-query-execution.html>
> 
> "24.7. Clustering For Parallel Query Execution
> 
> This allows multiple servers to work on a single query. One possible
> way this could work is for the data to be split among servers and for
> each server to execute its part of the query and results sent to a
> central server to be combined and returned to the user. There
> currently is no PostgreSQL open source solution for this."
> 
> This seems a bit thin.
> 
> "24.7. Clustering For Parallel Query Execution
> 
> This allows multiple servers to work concurrently on a single query,
> analagous to the way RAID permits multiple disk drives to respond
> concurrently to disk I/O requests.
> 
> One way this could work is for the data to be partitioned across the
> servers, where each server executes its part of the query, submitting
> results to a central server to be combined and returned to the user.
> There currently is no PostgreSQL open source solution for this."

I took some of your wording:

   This allows multiple servers to work concurrently on a single
   query.  One possible way this could work is for the data to be
   split among servers and for each server to execute its part of
   the query and results sent to a central server to be combined
   and returned to the user.  There currently is no
   <productname>PostgreSQL</> open source solution for this.

Because RAID is often used for high availability, I thought mentioning
it in this context was too complicated.

-- 
  Bruce Momjian   bruce(at)momjian(dot)us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

In response to

Responses

pgsql-docs by date

Next:From: Jeff FrostDate: 2006-11-14 22:02:21
Subject: Re: [HACKERS] Replication documentation addition
Previous:From: Mark KirkwoodDate: 2006-11-11 04:19:23
Subject: Re: Documentation update for PQexecParams

pgsql-hackers by date

Next:From: Simon RiggsDate: 2006-11-14 21:42:33
Subject: Re: [SQL] Case Preservation disregarding case
Previous:From: Bruce MomjianDate: 2006-11-14 18:43:58
Subject: Re: Case Preservation disregarding case sensitivity?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group