Re: Future In-Core Replication

From: Christopher Browne <cbbrowne(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Future In-Core Replication
Date: 2012-04-27 22:50:06
Message-ID: CAFNqd5U-6=i7+p_NZOSiXJu39VsA_3JH3+hRvVoaWhCrjfWXyQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 27, 2012 at 4:11 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> What I'm hoping to do is to build a basic prototype of logical
> replication using WAL translation, so we can inspect it to see what
> the downsides are. It's an extremely non-trivial problem and so I
> expect there to be mountains to climb. There are other routes to
> logical replication, with messages marshalled in a similar way to
> Slony/Londiste/Bucardo/Mammoth(?). So there are options, with
> measurements to be made and discussions to be had.

I'll note that the latest version of Slony, expected to be 2.2 (which
generally seems to work, but we're stuck at the moment waiting to get
free cycles to QA it) has made a substantial change to its data
representation.

The triggers used to cook data into a sort of "fractional WHERE
clause," transforming an I/U/D into a string that you'd trivially
combine with the string INSERT INTO/UPDATE/DELETE FROM to get the
logical update. If there was need to do anything fancier, you'd be
left having to have a "fractional SQL parser" to split the data out by
hand.

New in 2.2 is that the log data is split out into an array of text
values which means that if someone wanted to do some transformation,
such as filtering on value, or filtering out columns, they could
modify the application-of-updates code to query for the data that they
want to fiddle with. No parser needed.

It's doubtless worthwhile to take a peek at that to make sure it
informs your data representation appropriately. It's important to
have data represented in a fashion that is amenable to manipulation,
and that decidedly wasn't the case pre-2.2.

I wonder if a meaningful transport mechanism might involve combining:
a) A trigger that indicates that some data needs to be captured in a
"logical" form (rather than the presently pretty purely physical form
of WAL)
b) Perhaps a way of capturing logical updates in WAL
c) One of the old ideas that fell through was to try to capture commit
timestamps via triggers. Doing it directly turned out to be too
controversial to get in. Perhaps that's something that could be
captured via some process that parses WAL.

Something seems wrong about that in that it mixes together updates of
multiple forms into WAL, physical *and* logical, and perhaps that
implies that there should be an altogether separate "logical updates
log." (LUL?) That still involves capturing updates in a duplicative
fashion, e.g. - WAL + LUL, which seems somehow wrong. Or perhaps I'm
tilting at a windmill here. With Slony/Londiste/Bucardo, we're
capturing "LUL" in some tables, meaning that it gets written both to
the tables' data files as well as WAL. Adding a binary LUL eliminates
those table files and attendant WAL updates, thus providing some
savings.

[Insert a LULCATS joke here...]

Perhaps I've just had too much coffee...
--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2012-04-27 22:51:02 Re: plpython crash (PG 92)
Previous Message Kevin Grittner 2012-04-27 22:48:49 Re: xReader, double-effort