Re: Proposal for Implenting read-only queries during wal replay (SoC 2007)

From: Doug Knight <dknight(at)wsi(dot)com>
To: "Florian G(dot) Pflug" <fgp(at)phlo(dot)org>
Cc: Postgresql-General <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal for Implenting read-only queries during wal replay (SoC 2007)
Date: 2007-02-23 16:56:41
Message-ID: 1172249801.29320.34.camel@arc-dknightlx.wsicorp.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,
Here's some feedback, this is a feature that would be very useful to a
project I am currently working on.

Doug

On Fri, 2007-02-23 at 17:34 +0100, Florian G. Pflug wrote:
> Hi
>
> I plan to submit a proposal for implementing support for
> read-only queries during wal replay as a "Google Summer of Code 2007"
> project.
>
> I've been browsing the postgres source-code for the last few days,
> and came up with the following plan for a implementation.
>
> I'd be very interested in any feedback on the propsoal - especially
> of the "you overlooked this an that, it can never work that way" kind ;-)
>
> greetings, Florian Pflug
>
> Implementing read-only quries during wal archive replay
> -------------------------------------------------------
>
> Submitter: Florian Pflug <fgp(at)phlo(dot)org>
>
> Abstract:
> Implementing full support for read-only queries during
> wal archive replay is splitted into multiple parts, where
> each part offeres additional functionality over what
> postgres provides now. This makes tackling this as a
> "Google Summer of Code 2007" project feasable, and guarantees
> that at least some progress is made, even if solving the
> whole problem turns out to be harder then previously
> thought.
>
> Parts/Milestones of the implementation:
> A) Allow postgres to be started in read-only mode. After
> initial wal recovery, postgres doesn't perform writes
> anymore. All transactions started are implicitly in
> readonly mode. All transactions will be assigned dummy
> transaction ids, which never make it into the clog.
> B) Split StartupXLOG into two steps. The first (Recovery) will process
> only enough wal to bring the system into a consistent state,
> while the second one (Replay) replays the archive until it finds no
> more wal segments. This replay happens in chunks, such that
> after a chunk all *_safe_restartpoint functions return true.
> C) Combine A) and B), in the simplest possible way.
> Introduce a global R/W lock, which is taken by the Replay part
> of B) in write mode before replaying a chunk, then released,
> and immediatly reaquired before replaying the next chunk.
> The startup sequence is modified to do only the Recovery part
> where is is doing StartupXLOG now, and to lauch an extra process
> (similar to bgwriter) to do the second (Replay) part in the background.
> The system is then started up in read-only mode, with the addition
> that the global R/W lock is taken in read mode before starting any
> transaction. Thus, while a transaction is running, no archive replay
> happens.
>
> Benefits:
> *) Part A) alone might be of value for some people in the embedded world,
> or people who want to distribute software the use postgres. You could
> e.g. distribute a CD with a large, read-only database, and your application
> would just need to start postmaster to be able to query it directly from
> the CD.
> *) Read-only hot standby is a rather simple way to do load-balancing, if
> your application doesn't depend on the data being absolutely up-to-date.
> *) Even if this isn't used for load-balancing, it gives the DBA an
> easy way to check how far a PITR slave is lagging behind, therefore
> making PITR replication more user-friendly.
>
> Open Questions/Problems
> *) How do read-only transactions obtain a snapshot? Is it sufficient
> to just create an "empty" snapshot for them, meaning that they'll
> always look at the clog to obtain a transaction's state?
> *) How many places to attempt to issue writes? How hard is it to
> silence them all while in read-only mode.
> *) How does the user interface look like? I'm currently leaning towards
> a postgresql.conf setting read_only=yes. This would put postgres
> into read-only mode, and if a recovery.conf is present, archive
> replay would run as a background process.
>
> Limitations:
> *) The replaying process might be starved, letting the slave fall
> further and further behind the master. Only true if the slave
> executes a lot of queries, though.
> *) Postgres would continue to run in read-only mode, even after finishing
> archive recovery. A restart would be needed to switch it into read-write
> mode again. (I probably wouldn't be too hard to do that switch without
> a restart, but it seems better to tackle this after the basic features
> are working)
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2007-02-23 17:00:51 Re: Proposal for Implenting read-only queries during wal replay (SoC 2007)
Previous Message Andrew Dunstan 2007-02-23 16:55:52 Re: SCMS question