Re: [RFC] GSoC Work on readonly queries done so far

From: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
To: "Florian G(dot) Pflug" <fgp(at)phlo(dot)org>
Cc: "Postgresql-Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [RFC] GSoC Work on readonly queries done so far
Date: 2007-06-06 21:36:22
Views: Raw Message | Whole Thread | Download mbox
Lists: pgsql-hackers

On Wed, 2007-06-06 at 16:11 +0200, Florian G. Pflug wrote:

> .) Added a new GUC operational_mode, which can be set to either
> readwrite or readonly. If it is set to readwrite (the default),
> postgres behaves as usual. All the following changes are only
> in effect if operational_mode is set to readonly.

Do we need this? We are already InArchiveRecovery.

> .) Created a macro ASSUME_OPMODE_READWRITE that does elog(ERROR)
> if postgre is not in readwrite mode. This macro protects the
> following functions to make sure that no writes occur in
> readonly mode.
> SimpleLruWritePage, SLruPhysicalWritePage
> EndPrepare, FinishPreparedTransaction
> XLogInsert, XLogWrite, ShutdownXLog
> CreateCheckpoint
> MarkBufferDirty.

These are Asserts?

> .) All transactions are set to readonly mode (An implicit
> SET TRANSACTION READONLY), and are not allowed to do


> .) Don't start autovacuum and bgwriter. Instead of bgwriter, bgreplay
> is started, and it takes over that role that bgwriter play in the
> shutdown process.

Autovacuum -> understood.

What does bgreplay do? Why not just start bgwriter earlier and disable
some of its other functionality while InRecovery?

> .) Transactions are assigned a dummy xid ReadOnlyTransactionId, that
> is considered to be "later" than any other xid.

So you are bumping FirstNormalTransactionId up by one for this?

You're assuming then that we will "freeze" replay while we run a query?
Otherwise doing this will mean the snapshot changes as a query executes.

> .) A global ReadOnlySnapshot is maintained in shared memory. This is
> copied into backend local memory by GetReadonlySnapshotData (which
> replaces GetSnapshotData in readonly mode).
> .) Crash recovery is not performed in readonly mode - instead, postgres
> PANICs, and tells the DBA to restart in readwrite mode. Archive
> recovery of course *will* be allowed, but I'm not that far yet.

This is the very heart of the matter. This isn't just a technical issue,
it goes to the heart of the use case for this feature. Can we recover
while running queries? If not, how much time will we spend in replay
mode v query mode? Will we be able to run long running queries *and*
maintain a reasonable time to recover? Is this a mechanism for providing
HA and additional query capacity, or is it just a mechanism for
additional query capacity only? Those are open questions to which I
don't have any answers yet myself.

Will we switch back and forth between replay and query mode.
Do we connect to the master, or to the slave?
If we connect to the slave will we accept new queries when in replay
mode and pause them before we switch back to query mode.

> Open Problems:
> --------------
> .) Protecting MarkBufferDirty with ASSUME_OPMODE_READWRITE is troublesome,
> because callers usually call MarkBufferDirty from within a critical
> section, and thus elog(ERRROR) is turned into elog(PANIC). This e.g.
> happens with my patch if you call nextval() in readonly mode.
> Does anyone see a better solution then adding checks into
> all callers that are not otherwise protected from being called
> in readonly mode?

Do we need to do this at all?

> .) Since the slaves needs to track an Snapshot in shared memory, it cannot
> resize that snapshot to accomodate however many concurrent transactions
> might have been running on the master. My current plan is to detect if
> that global snapshot overflows, and to lock out readonly queries on the
> slave (and therefore remove the need of keeping the snapshot current)
> until the number of active xids on the master has dropped below
> max_connections on the slave. A warning will be written to the postgres
> log that suggest that the DBA increases the max_connections value on
> the slave.

Sized according to max_connections on the master?

Simon Riggs

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2007-06-06 21:41:18 Re: [RFC] GSoC Work on readonly queries done so far
Previous Message Alvaro Herrera 2007-06-06 21:14:42 Re: [RFC] GSoC Work on readonly queries done so far