Re: [PATCH 10/16] Introduce the concept that wal has a 'origin' node

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Christopher Browne <cbbrowne(at)gmail(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Simon Riggs <simon(at)2ndquadrant(dot)com>, heikki(dot)linnakangas(at)enterprisedb(dot)com, robertmhaas(at)gmail(dot)com, daniel(at)heroku(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us
Subject: Re: [PATCH 10/16] Introduce the concept that wal has a 'origin' node
Date: 2012-06-20 17:52:59
Message-ID: 201206201952.59865.andres@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Chris!

On Wednesday, June 20, 2012 07:06:28 PM Christopher Browne wrote:
> On Wed, Jun 20, 2012 at 11:50 AM, Andres Freund <andres(at)2ndquadrant(dot)com>
wrote:
> > On Wednesday, June 20, 2012 05:34:42 PM Kevin Grittner wrote:
> >> Simon Riggs <simon(at)2ndQuadrant(dot)com> wrote:
> >> > This is not transaction metadata, it is WAL record metadata
> >> > required for multi-master replication, see later point.
> >> >
> >> > We need to add information to every WAL record that is used as the
> >> > source for generating LCRs.
> >>
> >> If the origin ID of a transaction doesn't count as transaction
> >> metadata (i.e., data about the transaction), what does? It may be a
> >> metadata element about which you have special concerns, but it is
> >> transaction metadata. You don't plan on supporting individual WAL
> >> records within a transaction containing different values for origin
> >> ID, do you? If not, why is it something to store in every WAL
> >> record rather than once per transaction? That's not intended to be
> >> a rhetorical question.
> >
> > Its definitely possible to store it per transaction (see the discussion
> > around http://archives.postgresql.org/message-
> > id/201206201605(dot)43634(dot)andres(at)2ndquadrant(dot)com) it just makes the filtering
> > via the originating node a considerably more complex thing. With our
> > proposal you can do it without any complexity involved, on a low level.
> > Storing it per transaction means you can only stream out the data to
> > other nodes *after* fully reassembling the transaction. Thats a pitty,
> > especially if we go for a design where the decoding happens in a proxy
> > instance.
>
> I guess I'm not seeing the purpose to having the origin node id in the
> WAL stream either.
>
> We have it in the Slony sl_log_* stream, however there is a crucial
> difference, in that sl_log_* is expressly a shared structure. In
> contrast, WAL isn't directly sharable; you don't mix together multiple
> WAL streams.
>
> It seems as though the point in time at which you need to know the
> origin ID is the moment at which you're deciding to read data from the
> WAL files, and knowing which stream you are reading from is an
> assertion that might be satisfied by looking at configuration that
> doesn't need to be in the WAL stream itself. It might be *nice* for
> the WAL stream to be self-identifying, but that doesn't seem to be
> forcibly necessary.
>
> The case where it *would* be needful is if you are in the process of
> assembling together updates coming in from multiple masters, and need
> to know:
> - This INSERT was replicated from node #1, so should be ignored
> downstream - That INSERT was replicated from node #2, so should be ignored
> downstream - This UPDATE came from the local node, so needs to be passed
> to downstream users
Exactly that is the point. And you want to do that in an efficient manner
without too much logic, thats why something simple like the record header is
so appealing.

> > I also have to admit that I am very hesitant to start developing some
> > generic "transaction metadata" framework atm. That seems to be a good
> > way to spend a good part of time in discussion and disagreeing. Imo
> > thats something for later.

> Well, I see there being a use in there being at least 3 sorts of LCR
> records:
> a) Capturing literal SQL that is to replayed downstream
> b) Capturing tuple updates in a binary form that can be turned readily
> into heap updates on a replica.
> c) Capturing tuple data in some reasonably portable and readily
> re-writable form
I think we should provide the utilities to do all of those. a) is a
consequence of being able to do c).

That doesn't really have something to do with this subthread though? The part
you quoted above was my response to the suggestion to add some generic
framework to attach metadata to individual transactions on the generating
side. We quite possibly will end up needing that but I personally don't think
we should designing that part atm.

> b) Capturing tuple updates in a binary form that can be turned readily
> into heap updates on a replica.
> Unfortunately, this form is likely not to play well when
> replicating across platforms or Postgres versions, so I suspect that
> this performance optimization should be implemented as a *last*
> resort, rather than first. Michael Jackson had some "rules of
> optimization" that said "don't do it", and, for the expert, "don't do
> it YET..."
Well, apply is a bottleneck. Besides field experience I/We have benchmarked it
and its rather plausible that it is. And I don't think we can magically make
that faster in pg in general so my plan is to remove the biggest cost factor I
can see.
And yes, it will have restrictions...

Regards,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2012-06-20 18:01:09 Re: [PATCH 10/16] Introduce the concept that wal has a 'origin' node
Previous Message Simon Riggs 2012-06-20 17:51:02 Re: [PATCH 10/16] Introduce the concept that wal has a 'origin' node