Re: Hot Standby (v9d)

From: Hannu Krosing <hannu(at)krosing(dot)net>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, jd(at)commandprompt(dot)com, Gregory Stark <stark(at)enterprisedb(dot)com>, Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hot Standby (v9d)
Date: 2009-02-03 16:09:27
Message-ID: 1233677367.7999.33.camel@huvostro
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 2009-02-03 at 14:28 +0000, Simon Riggs wrote:
> On Tue, 2009-02-03 at 08:40 -0500, Andrew Dunstan wrote:
> >
> > Hannu Krosing wrote:
> > > Actually we came up with a solution to this - use filesystem level
> > > snapshots (like LVM2+XFS or ZFS), and redirect backends with
> > > long-running queries to use fs snapshot mounted to a different
> > > mountpoint.
> > >
> > > I don't think Simon has yet put full support for it in code, but it is
> > > clearly _the_ solution for those who want to eat the cake and have it
> > > too.
>
> > How does that work if you're using mutiple file systems via tablespaces
> > (e.g. indexes in a different TS)?
>
> It's a great idea and easy to do, but I can't do everything in one go.
>
> The main reasons not to are multiple file system difficulties and lack
> of a mainstream Linux solution, and more simply lack of time and test
> resources.

More general, but also lot harder, solution would be going back to roots
and implement what original postgres 4.2 and earlier versions were meant
to do - namely VACUUM was not meant to just discard older versions , but
rather move it to WORM storage (write once read many was all the rage
back then :) .

If we did that in a way that each relation, at least on warm standby ,
has its own "archive" fork, possibly in a separate tablespace for
cheaper storage, then we could basically apply WAL's as fast we want and
just move the old versions to "archive". It will be slower(ish),
especially for HOT updates, but may be a good solution for lots of
usecases.

And the decision to do the archiving on master and WAL-copy to slave, or
just do it on slave only could probably be left to user.

Reintroducing keeping old tuples "forever" would also allow us to bring
back time travel feature, that is

SELECT .... AS OF 'yesterday afternoon'::timestamp;

Which was thrown out at the times we got WAL-logging.

To be really useful we should also have some way to know trx timestamps,
but that can be easily done using ticker feature from Slony -
SkyTools/pgQ, which could be run a a separate server thread similar to
what we do with background writer, autovacuum etc. now.

>
> So not now, maybe later.
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2009-02-03 16:21:26 Re: add_path optimization
Previous Message Robert Haas 2009-02-03 15:54:14 Re: add_path optimization