Re: Catalog/Metadata consistency during changeset extraction from wal

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Florian Pflug <fgp(at)phlo(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: Catalog/Metadata consistency during changeset extraction from wal
Date: 2012-06-24 21:11:07
Message-ID: 201206242311.07948.andres@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thursday, June 21, 2012 01:41:25 PM Andres Freund wrote:
> Below are two possible implementation strategies for that concept
>
> Advantages:
> * Decoding is done on the master in an asynchronous fashion
> * low overhead during normal DML execution, not much additional code in
> that path
> * can be very efficient if architecture/version are the same
> * version/architecture compatibility can be done transparently by falling
> back to textual versions on mismatch
>
> Disadvantages:
> * decoding probably has to happen on the master which might not be what
> people want performancewise

> 3b)
> Ensure that enough information in the catalog remains by fudging the xmin
> horizon. Then reassemble an appropriate snapshot to read the catalog as
> the tuple in question has seen it.
>
> Advantages:
> * should be implementable with low impact to general code
>
> Disadvantages:
> * requires some complex code for assembling snapshots
> * it might be hard to guarantee that we always have enough information to
> reassemble a snapshot (subxid overflows ...)
> * impacts vacuum if replication to some site is slow
There are some interesting problems related to locking and snapshots here. Not
sure if they are resolvable:

We need to restrict SnapshotNow to represent to the view it had back when the
wal record were currently decoding had. Otherwise we would possibly get wrong
column types and similar. As were working in the past locking doesn't protect
us against much here. I have that (mostly and inefficiently).

One interesting problem are table rewrites (truncate, cluster, some ALTER
TABLE's) and dropping tables. Because we nudge SnapshotNow to the past view it
had back when the wal record was created we get the old relfilenode. Which
might have been dropped in part of the transaction cleanup...
With most types thats not a problem. Even things like records and arrays
aren't problematic. More interesting cases include VACUUM FULL $systable (e.g.
pg_enum) and vacuum full'ing a table which is used in the *_out function of a
type (like a user level pg_enum implementation).

The only theoretical way I see against that problem would be to postpone all
relation unlinks untill everything that could possibly read them has finished.
Doesn't seem to alluring although it would be needed if we ever move more
things of SnapshotNow.

Input/Ideas/Opinions?

Greetings,

Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Kerr 2012-06-24 21:33:41 empty backup_label
Previous Message Robert Haas 2012-06-24 20:05:20 Re: warning handling in Perl scripts