Re: Shared pg_xlog directory/partition and warm standby

From: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
To: "Florian G(dot) Pflug" <fgp(at)phlo(dot)org>
Cc: "Devrim GUNDUZ" <devrim(at)CommandPrompt(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Shared pg_xlog directory/partition and warm standby
Date: 2006-11-27 16:35:30
Message-ID: 1164645330.3778.200.camel@silverbirch.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 2006-11-27 at 14:17 +0100, Florian G. Pflug wrote:
> Devrim GUNDUZ wrote:
> > Is there anything that may prevent two PostgreSQL servers to share the
> > same pg_xlog directory; while one is using read-only and the other one
> > is using the same partition for read and write? The problem is: If we
> > share the same pg_xlog between production server and warm standby
> > server; can you see any possibility of data/xlog corruption? Of course,
> > warm standby server will mount that partition as read-only.
>
> What happens in the standby server falls so far behind the master that
> the xlogs it wants to read are already being overwritten?
>
> AFAIK the files in pg_xlog form a circular buffer, and are reused after
> a while...

If the archive_command doesn't actually do anything, just leaves them
there, the files will automatically get moved to .done state and will
then get removed within 2 checkpoints. So it will work as long as your
standby keeps up with the primary. If it falls behind, you'll lose the
file and you'll be out of luck (no file, start from base backup again).
A large checkpoint_segments would help, but no way to avoid that
situation.

The archiver assumes that you want to archive things oldest first, so if
the archive_command fails it will retry on that file repeatedly. Put it
another way the archiving is synchronous: when an archive is requested
we wait for the answer before attempting the next.

I suppose we might want to have multiple archivals occurring
simultaneously by overlapping their start and stop times. That might be
useful for situations where we have a bank of slow response tape
drives/autoloaders?

You'd need to have a second archive command to poll for completion.
Currently archive_status has 2 states: .ready and .done. We could have 3
states: .ready, .inprogress and .done. The first archive_command_start,
if successful would move the state from .ready to .inprogress, while the
second archive_command_confirm would move the state from .inprogress
to .done. (Better names please...)

With an asynchronous API, it would then be possible to fire off requests
to archive lots of files, then return later to confirm their completion.
Or in Devrim's case do nothing apart from wait for them to be applied by
the Standby server.

Anybody else see the need for this?

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2006-11-27 17:02:28 Re: Configuring BLCKSZ and XLOGSEGSZ (in 8.3)
Previous Message Mike Rylander 2006-11-27 16:03:47 Re: Configuring BLCKSZ and XLOGSEGSZ (in 8.3)