Re: Catalog/Metadata consistency during changeset extraction from wal

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Florian Pflug <fgp(at)phlo(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Catalog/Metadata consistency during changeset extraction from wal
Date: 2012-06-25 18:19:47
Message-ID: 201206252019.48154.andres@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

(munching the mail from Robert and Kevin together)

On Monday, June 25, 2012 06:42:41 PM Kevin Grittner wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > I bet for a lot of replication systems, the answer is "do a full
> > resync". In other words, we either forbid the operation outright
> > when the table is enabled for logical replication, or else we emit
> > an LCR that says, in effect, "transaction 12345 monkeyed with the
> > table, please resync". It strikes me that it's really the job of
> > some higher-level control logic to decide what the "correct"
> > behavior is in these cases; the decoding process doesn't really
> > have enough information about what the user is trying to do to
> > make a sensible decision anyway.
>
> This is clearly going to depend on the topology. You would
> definitely want to try to replicate the DDL for the case on which
> Simon is focused (which seems to me to be essentially physical
> replication of catalogs with logical replication of data changes
> from any machine to all others). What you do about transactions in
> flight is the hard part. You could try to suppress concurrent DML
> of the same objects or have some complex matrix of rules for trying
> to resolve the transactions in flight. I don't see how the latter
> could ever be 100% accurate.
Yes. Thats why I dislike that proposal. I don't think thats going to be
understandable and robust enough.

If we really look inside transactions (3b) and 1)) that shouldn't be a problem
though. So I think it really has to be one of those.

> In our shop it is much easier. We always have one database which is
> the only valid source for any tuple, although rows from many such
> databases can be in one table, and one row might replicate to many
> databases. Thus, we don't want automatic replication of DDL.
>
> - When a column is going to be added to the source machines, we
> first add it to the targets, with either a default or as
> NULL-capable.
>
> - When a column is going to be deleted from the source machines, we
> make sure it is NULL-capable or has a default on the replicas.
> We drop it from all replicas after it is gone from all sources.
>
> - If a column is changing name or is changing to a fundamentally
> different type we need to give the new column a new name, have
> triggers to convert old to new (and vice versa) on the replicas,
> and drop the old after all sources are updated.
>
> - If a column is changing in a minor way, like its precision, we
> make sure the replicas can accept either format until all sources
> have been converted. We update the replicas to match the sources
> after all sources are converted.
>
> We most particularly *don't* want DDL to replicate automatically,
> because the schema changes are deployed along with related software
> changes, and we like to pilot any changes for at least a few days.
> Depending on the release, the rollout may take a couple months, or
> we may slam in out everywhere a few days after the first pilot
> deployment.
Thats a sensible for your use-case - but I do not think its thats the
appropriate behaviour for anything which is somewhat out-of-the box...

> So you could certainly punt all of this for any release as far as
> Wisconsin Courts are concerned. We need to know table and column
> names, before and after images, and some application-supplied
> metadata.
I am not sure were going to get all that into 9.3. More on that below.

On Monday, June 25, 2012 07:09:38 PM Robert Haas wrote:
> On Mon, Jun 25, 2012 at 12:42 PM, Kevin Grittner wrote:
> > I don't know that what we're looking for is any easier (although I
> > doubt that it's any harder), but I'm starting to wonder how much
> > mechanism they can really share. The 2Q code is geared toward page
> > format OIDs and data values for automated DDL distribution and
> > faster replication, while we're looking for something which works
> > between releases, architectures, and OSes. We keep coming back to
> > the idea of one mechanism because both WAL and a logical transaction
> > stream would have "after" tuples, although they need them in
> > different formats.
> >
> > I think the need for truly logical replication is obvious, since so
> > many different people have developed trigger-based versions of that.
> > And it sure seems like 2Q has clients who are willing to pay for the
> > other.
> >
> > Perhaps the first question is: Is there enough in common between
> > logical replication (and all the topologies that might be created
> > with that) and the proposal on the table (which seems to be based
> > around one particular topology with a vague notion of bolting
> > logical replication on to it after the fact) to try to resolve the
> > differences in one feature? Or should the "identical schema with
> > multiple identical copies" case be allowed to move forward more or
> > less in isolation, with logical replication having its own design if
> > and when someone wants to take it on? Two non-compromised features
> > might be cleaner -- I'm starting to feel like we're trying to design
> > a toaster which can also water your garden.
I think there are some pieces which can be shared without too many problems
(general wal reading, enough information in wal for decoding, new wal level,
transaction reassembly, ...). Other pieces are less clear (wal decoding,
transport format, ddl handling ...) and others clearly won't be shared (low
level apply, conflict resolution hooks, ...).
Whether some of that will be shareeable between different interests also
depends on how many people are willing to chime in and participate. While I am
happy to put in some extra time to make stuff fully generic there
unfortunately (I really mean that!) are limits on how much effort & time I can
pour into it. I find that to be an important point.

Another factor obviously is how hard it is to make something generic ;). The
whole problem is nothing simple otherwise we would have existing solutions...

> I think there are a number of shared pieces. Being able to read WAL
> and do something with it is a general need that both solutions share;
> I think actually that might be the piece that we should try to get
> committed first. I suspect that there are a number of applications
> for just that and nothing more - for example, it might allow a contrib
> module that reads WAL as it's generated and prints out a debug trace,
> which I can imagine being useful.
I plan to revise my patch for that now that Heikki's most invasive changes
have been committed. Reworking the innards of XLogInsert shouldn't change the
wal formats anymore...

> Also, I think that even for MMR there will be a need for control
> logic, resynchronization, and similar mechanisms. I mean, suppose you
> have four servers in an MMR configuration. Now, you want to deploy a
> schema change that adds a new column and which, as it so happens,
> requires a table rewrite to add the default. It is very possible that
> you do NOT want that to automatically replicate around the cluster.
> Instead, you likely want to redirect load to the remaining three
> servers, do the change on the fourth, put it back into the ring and
> take out a different one, do the change on that one, and so on.
Thats all nice, but I think its pretty clear were not getting anything that
sophisticated in the near future ;)

Greetings,

Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2012-06-25 18:49:24 Re: new --maintenance-db options
Previous Message Robert Haas 2012-06-25 18:13:54 Re: Catalog/Metadata consistency during changeset extraction from wal