Re: Replication identifiers, take 3

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Steve Singer <steve(at)ssinger(dot)info>, Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Replication identifiers, take 3
Date: 2014-10-02 08:49:31
Message-ID: 542D119B.2070704@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 09/23/2014 09:24 PM, Andres Freund wrote:
> I've previously started two threads about replication identifiers. Check
> http://archives.postgresql.org/message-id/20131114172632.GE7522%40alap2.anarazel.de
> and
> http://archives.postgresql.org/message-id/20131211153833.GB25227%40awork2.anarazel.de
> .
>
> The've also been discussed in the course of another thread:
> http://archives.postgresql.org/message-id/20140617165011.GA3115%40awork2.anarazel.de

And even earlier here:
http://www.postgresql.org/message-id/flat/1339586927-13156-10-git-send-email-andres(at)2ndquadrant(dot)com#1339586927-13156-10-git-send-email-andres@2ndquadrant.com
The thread branched a lot, the relevant branch is the one with subject
"[PATCH 10/16] Introduce the concept that wal has a 'origin' node"

> == Identify the origin of changes ==
>
> Say you're building a replication solution that allows two nodes to
> insert into the same table on two nodes. Ignoring conflict resolution
> and similar fun, one needs to prevent the same change being replayed
> over and over. In logical replication the changes to the heap have to
> be WAL logged, and thus the *replay* of changes from a remote node
> produce WAL which then will be decoded again.
>
> To avoid that it's very useful to tag individual changes/transactions
> with their 'origin'. I.e. mark changes that have been directly
> triggered by the user sending SQL as originating 'locally' and changes
> originating from replaying another node's changes as originating
> somewhere else.
>
> If that origin is exposed to logical decoding output plugins they can
> easily check whether to stream out the changes/transactions or not.
>
>
> It is possible to do this by adding extra columns to every table and
> store the origin of a row in there, but that a) permanently needs
> storage b) makes things much more invasive.

An origin column in the table itself helps tremendously to debug issues
with the replication system. In many if not most scenarios, I think
you'd want to have that extra column, even if it's not strictly required.

> What I've previously suggested (and which works well in BDR) is to add
> the internal id to the XLogRecord struct. There's 2 free bytes of
> padding that can be used for that purpose.

Adding a field to XLogRecord for this feels wrong. This is for *logical*
replication - why do you need to mess with something as physical as the
WAL record format?

And who's to say that a node ID is the most useful piece of information
for a replication system to add to the WAL header. I can easily imagine
that you'd want to put a changeset ID or something else in there,
instead. (I mentioned another example of this in
http://www.postgresql.org/message-id/4FE17043.60403@enterprisedb.com)

If we need additional information added to WAL records, for extensions,
then that should be made in an extensible fashion. IIRC (I couldn't find
a link right now), when we discussed the changes to heap_insert et al
for wal_level=logical, I already argued back then that we should make it
possible for extensions to annotate WAL records, with things like "this
is the primary key", or whatever information is needed for conflict
resolution, or handling loops. I don't like it that we're adding little
pieces of information to the WAL format, bit by bit.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2014-10-02 09:02:50 Re: Escaping from blocked send() reprised.
Previous Message Kyotaro HORIGUCHI 2014-10-02 08:47:39 Re: Escaping from blocked send() reprised.