Re: Hot Standby, release candidate?

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Hot Standby, release candidate?
Date: 2009-12-13 22:22:06
Message-ID: 1260742926.1955.195.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, 2009-12-13 at 15:45 -0500, Tom Lane wrote:
> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> > * NonTransactionalInvalidation logging has been removed following
> > review, but AFAICS that means VACUUM FULL doesn't work correctly on
> > catalog tables, which regrettably will be the only ones still standing
> > even after we apply VFI patch. Did I misunderstand the original intent?
> > Was it just buggy somehow? Or is this hoping VF goes completely, which
> > seems unlikely in this release.
>
> For my money, the only reason VF is still around is there hasn't been
> an urgent reason to get rid of it. If it doesn't play with HS, I think
> we'd be better served to put work into getting rid of it than to put
> work into fixing it.

I see the logic, though it has many implications. I'll step up, if I can
get some help from you and Itagaki on the VF side.

You have a rough design here
http://archives.postgresql.org/message-id/19750.1252094460@sss.pgh.pa.us

Some thoughts and some further work on a detailed design

* Which exact tables are we talking about: just pg_class and the shared
catalogs? Everything else is in pg_class, so if we can find it we're OK?
formrdesc() tells me the list of nailed relations is: pg_database,
pg_class, pg_attribute, pg_proc, and pg_type. Are the nailed relations
the ones we care about, or are they just a subset?

* Restrict set of operations to *only* VACUUM FULL. Is there a need for
anything else to do this, at least in this release?

* Each backend needs to access two map files: shared and local

* Get relcache to read map files at startup in formrdesc(). Rather than
use RelationInitPhysicalAddr() set relation->rd_node.relNode directly

* Get VF to write a new type of invalidation message that means re-read
the two map files to overwrite the relation->rd_node.relNode in the
nailed relations

* Map files would have a very structured format, so each table listed
has its exact place. Sounds like best place for shared catalogs is
pg_control. We only need a few additional bytes for that and everything
else to manipulate it already exists.

* Map files for specific databases would be called pg_database_control,
with roughly same concepts as pg_control. It's then an obvious place to
add any further db specific things in future, if we need them.

* Protect all map files reading/writing using ControlFileLock. Sequence
of update is acquire lock, send invalidation, rewrite file, release lock
all inside a critical section. Readers would take shared, writers
exclusive.

* Work would be in two tranches: add new way of working then later
remove code we don't need; I would actually rather do the second part at
start of next dev cycle.

--
Simon Riggs www.2ndQuadrant.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Takahiro Itagaki 2009-12-14 00:26:23 Re: Largeobject Access Controls and pg_migrator
Previous Message James Pye 2009-12-13 22:02:26 Re: plpython3