Re: [HACKERS] Replication documentation addition

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Chris Browne <cbbrowne(at)acm(dot)org>
Cc: pgsql-docs(at)postgresql(dot)org
Subject: Re: [HACKERS] Replication documentation addition
Date: 2006-11-14 21:42:23
Message-ID: 200611142142.kAELgNO25509@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-docs pgsql-hackers

Chris Browne wrote:
> bruce(at)momjian(dot)us (Bruce Momjian) writes:
> > With no new additions submitted today, I have moved my text into our
> > SGML documentation:
> >
> > http://momjian.us/main/writings/pgsql/sgml/failover.html
> >
> > Please let me know what additional changes are needed.
>
> It's looking a lot improved to me...
>
> There are still numerous places where it needs s/Slony/Slony-I/g
> because there is more than one thing out there called "Slony," only
> one of which is the single-master-to-multiple-subscribers-asynchronous
> replication system...

Fixed.

> <http://momjian.us/main/writings/pgsql/sgml/query-broadcast-load-balancing.html>
>
> "This can be complex to set up because functions like random() and
> CURRENT_TIMESTAMP will have different values on different servers, and
> sequences should be consistent across servers."
>
> It doesn't make sense to call this "complex to set up." This problem
> isn't about complexity of setup; it is about whether updates are
> processed identically on different hosts.
>
> Perhaps better:
>
> "Query broadcasting can break down such that servers fall out of sync
> if the queries have nondeterministic behavior. For instance,
> functions like random(), CURRENT_TIMESTAMP, and
> nextval('some_sequence') will take on different values on different
> servers. Care must be taken at the application level to make sure
> that queries are all fully deterministic and that they either COMMIT
> or ABORT on all servers."

I redid the section with:

Because each server operates independently, functions like
<function>random()</>, <function>CURRENT_TIMESTAMP</>, and
sequences can have different values on different servers. If
this is unacceptable, applications must query such values from
a single server and then use those values in write queries.
Also, care must also be taken that all transactions either commit
or abort on all servers Pgpool is an example of this type of
replication.

> <http://momjian.us/main/writings/pgsql/sgml/clustering-for-load-balancing.html>
> "24.6. Clustering For Load Balancing
>
> In clustering, each server can accept write requests, and these write
> requests are broadcast from the original server to all other servers
> before each transaction commits. Under heavy load, this can cause
> excessive locking and performance degradation. It is implemented by
> Oracle in their RAC product. PostgreSQL does not offer this type of
> load balancing, though PostgreSQL two-phase commit can be used to
> implement this in application code or middleware."
>
> Something doesn't feel entirely right here...
>
> How about...
>
> "24.6. Multimaster Replication For Load Balancing
>
> In this scenario, each server can accept write requests, which are
> broadcast from the original server to all other servers before each
> transaction commits in order to ensure consistency. Unfortunately,
> under heavy load, the cost of distributing locks across servers can
> lead to substantial performance degradation. It is implemented by
> Oracle in their RAC product. PostgreSQL does not offer this type of
> load balancing, though PostgreSQL two-phase commit using <xref
> linkend="sql-prepare-transaction-title"> and <xref linkend=
> "sql-commit-prepared-title"> may be used to implement this in
> application code or middleware.
>
> The communications costs involved in distributing locks and writes
> have the result that write operations are considerably more expensive
> than they would be on a single server. In general, the cost of
> distributed locking means that this clustering approach is only usable
> across a cluster of servers at a local site.
>
> There will only be a performance "win" if the cluster mostly processes
> read-only traffic that the cluster can distribute across a larger
> number of database servers. Write performance generally degrades a
> fair bit as compared to using a single database server. Reliability
> should be enhanced since the cluster should be able to continue work
> even if some of the members of the cluster should fail."

Your description was too detailed, but I took some of your concepts:

<para>
In clustering, each server can accept write requests, and these
write requests are broadcast from the original server to all
other servers before each transaction commits. Heavy write
activity can cause excessive locking, leading to poor performance.
In fact, write performance is often worse than that of a single
server. Read requests can be sent to any server. Clustering
is best for mostly read workloads, though its big advantage is
that any server can accept write requests --- there is no need
to partition workloads between read/write and read-only servers.
</para>

<para>
Clustering is implemented by <productname>Oracle</> in their
<productname><acronym>RAC</></> product. <productname>PostgreSQL</>
does not offer this type of load balancing, though
<productname>PostgreSQL</> two-phase commit (<xref
linkend="sql-prepare-transaction-title"> and <xref linkend=
"sql-commit-prepared-title">) can be used to implement this in
application code or middleware.
</para>

>
> <http://momjian.us/main/writings/pgsql/sgml/clustering-for-parallel-query-execution.html>
>
> "24.7. Clustering For Parallel Query Execution
>
> This allows multiple servers to work on a single query. One possible
> way this could work is for the data to be split among servers and for
> each server to execute its part of the query and results sent to a
> central server to be combined and returned to the user. There
> currently is no PostgreSQL open source solution for this."
>
> This seems a bit thin.
>
> "24.7. Clustering For Parallel Query Execution
>
> This allows multiple servers to work concurrently on a single query,
> analagous to the way RAID permits multiple disk drives to respond
> concurrently to disk I/O requests.
>
> One way this could work is for the data to be partitioned across the
> servers, where each server executes its part of the query, submitting
> results to a central server to be combined and returned to the user.
> There currently is no PostgreSQL open source solution for this."

I took some of your wording:

This allows multiple servers to work concurrently on a single
query. One possible way this could work is for the data to be
split among servers and for each server to execute its part of
the query and results sent to a central server to be combined
and returned to the user. There currently is no
<productname>PostgreSQL</> open source solution for this.

Because RAID is often used for high availability, I thought mentioning
it in this context was too complicated.

--
Bruce Momjian bruce(at)momjian(dot)us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

In response to

Responses

Browse pgsql-docs by date

  From Date Subject
Next Message Jeff Frost 2006-11-14 22:02:21 Re: [HACKERS] Replication documentation addition
Previous Message Mark Kirkwood 2006-11-11 04:19:23 Re: Documentation update for PQexecParams

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2006-11-14 21:42:33 Re: [SQL] Case Preservation disregarding case
Previous Message Bruce Momjian 2006-11-14 18:43:58 Re: Case Preservation disregarding case sensitivity?