Re: Catalog/Metadata consistency during changeset extraction from wal

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Florian Pflug <fgp(at)phlo(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: Catalog/Metadata consistency during changeset extraction from wal
Date: 2012-06-25 13:43:39
Message-ID: 201206251543.40142.andres@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Monday, June 25, 2012 03:08:51 AM Robert Haas wrote:
> On Sun, Jun 24, 2012 at 5:11 PM, Andres Freund <andres(at)2ndquadrant(dot)com>
wrote:
> > There are some interesting problems related to locking and snapshots
> > here. Not sure if they are resolvable:
> >
> > We need to restrict SnapshotNow to represent to the view it had back when
> > the wal record were currently decoding had. Otherwise we would possibly
> > get wrong column types and similar. As were working in the past locking
> > doesn't protect us against much here. I have that (mostly and
> > inefficiently).
> >
> > One interesting problem are table rewrites (truncate, cluster, some ALTER
> > TABLE's) and dropping tables. Because we nudge SnapshotNow to the past
> > view it had back when the wal record was created we get the old
> > relfilenode. Which might have been dropped in part of the transaction
> > cleanup...
> > With most types thats not a problem. Even things like records and arrays
> > aren't problematic. More interesting cases include VACUUM FULL $systable
> > (e.g. pg_enum) and vacuum full'ing a table which is used in the *_out
> > function of a type (like a user level pg_enum implementation).
> >
> > The only theoretical way I see against that problem would be to postpone
> > all relation unlinks untill everything that could possibly read them has
> > finished. Doesn't seem to alluring although it would be needed if we
> > ever move more things of SnapshotNow.
> >
> > Input/Ideas/Opinions?
>
> Yeah, this is slightly nasty. I'm not sure whether or not there's a
> way to make it work.
Postponing all non-rollback unlinks to the next "logical checkpoint" is the
only thing I can think of...

> I had another idea. Suppose decoding happens directly on the primary,
> because I'm still hoping there's a way to swing that. Suppose further
> that we handle DDL by insisting that (1) any backend which wants to
> add columns or change the types of existing columns must first wait
> for logical replication to catch up and (2) if a backend which has
> added columns or changed the types of existing columns then writes to
> the modified table, decoding of those writes will be postponed until
> transaction commit. I think that's enough to guarantee that the
> decoding process can just use the catalogs as they stand, with plain
> old SnapshotNow.
I don't think its that easy. If you e.g. have multiple ALTER's in the same
transaction interspersed with inserted rows they will all have different
TupleDesc's.
I don't see how thats resolvable without either replicating ddl to the target
system or changing what SnapshotNow does...

> The downside of this approach is that it makes certain kinds of DDL
> suck worse if logical replication is in use and behind. But I don't
> necessarily see that as prohibitive because (1) logical replication
> being behind is likely to suck for a lot of other reasons too and (2)
> adding or retyping columns isn't a terribly frequent operation and
> people already expect a hit when they do it. Also, I suspect that we
> could find ways to loosen those restrictions at least in common cases
> in some future version; meanwhile, less work now.
Agreed.

Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2012-06-25 13:46:26 Re: pg_tablespace.spclocation column removed in 9.2
Previous Message Tom Lane 2012-06-25 13:35:22 Re: warning handling in Perl scripts