Synchronous replication patch built on SR

From: zb(at)cybertec(dot)at
To: pgsql-hackers(at)postgresql(dot)org
Cc: hs(at)cybertec(dot)at
Subject: Synchronous replication patch built on SR
Date: 2010-04-30 08:58:22
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Resending, my ISP lost my mail yesterday. :-(



attached is a patch that does $SUBJECT, we are submitting it for 9.1.
I have updated it to today's CVS after the "wal_level" GUC went in.

How does it work?

First, the walreceiver and the walsender are now able to communicate
in a duplex way on the same connection, so while COPY OUT is
in progress from the primary server, the standby server is able to
issue PQputCopyData() to pass the transaction IDs that were seen
signatures. I did by adding a new protocol message type, with letter
'x' that's only acknowledged by the walsender process. The regular
backend was intentionally unchanged so an SQL client gets a protocol
error. A new libpq call called PQsetDuplexCopy() which sends this
new message before sending START_REPLICATION. The primary
makes a note of it in the walsender process' entry.

I had to move the TransactionIdLatest(xid, nchildren, children) call
that computes latestXid earlier in RecordTransactionCommit(), so
it's in the critical section now, just before the
call. Otherwise, there was a race condition between the primary
and the standby server, where the standby server might have seen
the XLOG_XACT_COMMIT record for some XIDs before the
transaction in the primary server marked itself waiting for this XID,
resulting in stuck transactions.

I have added 3 new options, two GUCs in postgresql.conf and one
setting in recovery.conf. These options are:

1. min_sync_replication_clients = N

where N is the number of reports for a given transaction before it's
released as committed synchronously. 0 means completely asynchronous,
the value is maximized by the value of max_wal_senders. Anything
in between 0 and max_wal_senders means different levels of partially
synchronous replication.

2. strict_sync_replication = boolean

where the expected number of synchronous reports from standby
servers is further limited to the actual number of connected synchronous
standby servers if the value of this GUC is false. This means that if
no standby servers are connected yet then the replication is asynchronous
and transactions are allowed to finish without waiting for synchronous
reports. If the value of this GUC is true, then transactions wait until
enough synchronous standbys connect and report back.

3. synchronous_slave = boolean (in recovery.conf)

this instructs the standby server to tell the primary that it's a
replication server and it will send the committed XIDs back to the primary.

I also added a contrib module for monitoring the synchronous replication
but it abuses the procarray.c code by exposing the procArray pointer
which is ugly. It's either need to be abandoned or moved to core if or when
this code is discussed enough. :-)

Best regards,
Zoltán Böszörményi

Attachment Content-Type Size
pg91-syncrep-15-ctxdiff.patch text/x-patch 56.0 KB


Browse pgsql-hackers by date

  From Date Subject
Next Message Aftab Hussain 2010-04-30 10:09:17 Patch for PKST timezone
Previous Message Dimitri Fontaine 2010-04-30 08:43:43 Re: pg_migrator to /contrib in a later 9.0 beta