Re: pglogical_output - a general purpose logical decoding output plugin

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Tomasz Rybak <tomasz(dot)rybak(at)post(dot)pl>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, konstantin knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Subject: Re: pglogical_output - a general purpose logical decoding output plugin
Date: 2016-01-07 07:50:46
Message-ID: CAMsr+YFRz90Vd3MJ0pJfQb1UJaXDX01cqrWPxjnzc6G0W2mYSg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 7 January 2016 at 01:17, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:

> On 12/22/15 4:55 AM, Craig Ringer wrote:
> > I'm a touch frustrated by that, as a large part of the point of
> > submitting the output plugin separately and in advance of the downstream
> > was to get attention for it separately, as its own entity. A lot of
> > effort has been put into making this usable for more than just a data
> > source for pglogical's replication tools.
>
> I can't imagine that there is a lot of interest in a replication tool
> where you only get one side of it, no matter how well-designed or
> general it is.

Well, the other part was posted most of a week ago.

http://www.postgresql.org/message-id/5685BB86.5010901@2ndquadrant.com

... but this isn't just about replication. At least, not just to another
PostgreSQL instance. This plugin is designed to be general enough to use
for replication to other DBMSes (via appropriate receivers), to replace
trigger-based data collection in existing replication systems, for use in
audit data collection, etc.

Want to get a stream of data out of PostgreSQL in a consistent, simple way,
without having to add triggers or otherwise interfere with the origin
database? That's the purpose of this plugin, and it doesn't care in the
slightest what the receiver wants to do with that data. It's been designed
to be usable separately from pglogical downstream and - before the Python
tests were rejected in discussions on this list - was tested using a test
suite completely separate to the pglogical downstream using psycopg2 to
make sure no unintended interdependencies got introduced.

You can do way more than that with the output plugin but you have to write
your own downstream/receiver for the desired purpose, since using a
downstream based on bgworkers and SPI won't make any sense outside
PostgreSQL.

If you just want a canned product to use, see the pglogical post above for
the downstream code.

> Ultimately, what people will want to do with this is
> replicate things, not muse about its design aspects. So if we're going

to ship a replication solution in PostgreSQL core, we should ship all
> the pieces that make the whole system work.
>

I don't buy that argument. Doesn't that mean logical decoding shouldn't
have been accepted? Or the initial patches for parallel query? Or any
number of other things that're part of incremental development solutions?

(This also seems to contradict what you then argue below, that the proposed
feature is too broad and does too much.)

I'd be happy to see both parts go in, but I'm frustrated that nobody's
willing to see beyond "replicate from one Pg to another Pg" and see all the
other things you can do. Want to replicate to Oracle / MS-SQL / etc? This
will help a lot and solve a significant part of the problem for you. Want
to stream data to append-only audit logs? Ditto. But nope, it's all about
PostgreSQL to PostgreSQL.

Please try to look further into what client applications can do with this
directly. I already know it meets the needs of the pglogical downstream.
What I was hoping to achieve with posting the output plugin earlier was to
get some thought going about what *else* it'd be good for.

Again: pglogical is posted now (it just took longer than expected to get
ready) and I'll be happy to see both it and the output plugin included. I
just urge people to look at the output plugin as more than a tightly
coupled component of pglogical.

Maybe some quality name bikeshedding for the output plugin would help ;)

Also, I think there are two kinds of general systems: common core, and
> all possible features. A common core approach could probably be made
> acceptable with the argument that anyone will probably want to do things
> this way, so we might as well implement it once and give it to people.
>

That's what we're going for here. Extensible, something people can build on
and use.

> In a way, the logical decoding interface is the common core, as we
> currently understand it. But this submission clearly has a lot of
> features beyond just the basics

Really? What would you cut? What's beyond the basics here? What basics are
you thinking of, i.e what set of requirements are you working towards /
needs are you seeking to meet?

We cut this to the bone to produce a minimum viable logical replication
solution. Especially the output plugin.

Cut the hook interfaces for row and xact filtering? You lose the ability to
use replication origins, crippling functionality, and for no real gain in
simplicity.

Remove JSON support? That's what most people are actually likely to want to
use when using the output plugin directly, and it's important for
debugging/tracing/diagnostics. It's a separate feature, to be sure, but
it's also a pretty trivial addition.

> and we could probably go through them
> one by one and ask, why do we need this bit? So that kind of system
> will be very hard to review as a standalone submission.
>

Again, I disagree. I think you're looking at this way too narrowly.

I find it quite funny, actually. Here we go and produce something that's a
nice re-usable component that other people can use in their products and
solutions ... and all anyone does is complain that the other part required
to use it as a canned product isn't posted yet (though it is now). But with
BDR all anyone ever does is complain that it's too tightly coupled to the
needs of a single product and the features extracted from it, like
replication origins, should be more generic and general purpose so other
people can use them in their products too. Which is it going to be?

It would be helpful if you could take a step back and describe what *you*
think logical replication for PostgreSQL should look like. You clearly have
a picture in mind of what it should be, what requirements it satisfies,
etc. If you're going to argue based on that it'd be very helpful to
describe it. I might've missed some important points you've seen and you
might've overlooked issues I've seen.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2016-01-07 07:58:33 Re: pglogical_output - a general purpose logical decoding output plugin
Previous Message Michael Paquier 2016-01-07 07:42:09 Re: extend pgbench expressions with functions