Changeset Extraction Interfaces

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Changeset Extraction Interfaces
Date: 2013-12-05 00:15:20
Message-ID: 20131205001520.GA8935@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Short recap:

From the perspective of the user interface the changeset extraction
feature consists out of two abstract interfaces that the "user" has to
do with:

1) The "slot" or "changestream" management interface which manages
individual streams of changes. The user can create and destroy a
changestream, and most importantly stream the changes.

Simplified, a "logical replication slot" is a position in the WAL and a
bunch of state associated with it. As long as a slot exists, the user
can ask, for all changes that happened since the last time he asked, to
be streamed out.

It is abstract, because different usecases require the changes to be
streamed out via different methods. The series contains two
implementation of that interface:
I) One integrated into walsender that allows for efficient streaming,
including support for synchronous replication.
II) Another that is accessible via SQL functions, very useful for
writing pg_regress/isolationtester tests.

It is, with a relatively low amount of code, possible to add other such
interfaces without touching core code. One example, that has been asked
for by a number of people, is consuming the changestream in a background
worker without involving SQL or connecting to a walsender.

There's basically three major 'verbs' that can be performed on a
stream, currently named (walsender names):
* INIT_LOGICAL_REPLICATION "name" "output_plugin"
* START_LOGICAL_REPLICATION "name" last_received ("option_name" value,...)
* FREE_LOGICAL_REPLICATION "name"

The SQL variant currrently has:
* init_logical_replication(name, plugin)
* start_logical_replication(name, stream_upto, options[])
* stop_logical_replication(name)

You might have noticed the slight inconsistency...

2) The "output plugin" interface, which transforms a changestream
(begin, change, commit) into the desired target format.

There are 5 callbacks, 3 of them obligatory:
* pg_decode_init(context, is_initial) [optional]
* pg_decode_begin(context, txn)
* pg_decode_change(context, txn, relation, change)
* pg_decode_commit(context, txn)
* pg_decode_cleanup(context) [optional]

Every output plugin can be used from every slot management
interface.

The current pain points, that I'd like to discuss, are:
a) Better naming for the slot management between walsender, SQL and
possible future interfaces.

b) Decide which of the SQL functions should be in a contrib module, and
which in core. Currently init_logical_replication() and
stop_logical_replication() are in core, whereas
start_logical_replication() is in the 'test_logical_decoding'
extension. The reasoning behind that is that init/stop ones are
important to the DBA and the start_logical_replication() SRF isn't
all that useful in the real world because our SRFs don't support
streaming changes out.

c) Which data-types does start_logical_replication() return. Currently
it's OUT location text, OUT xid bigint, OUT data text. Making the 'data'
column text has some obvious disadvantages though - there's obvious
usecases for output plugins that return binary data. But making it bytea
sucks, because the output is harder to read by default...

d) How does a slot acquire the callbacks of an output plugin.

For a), my current feeling is to name them:
* LOGICAL_DECODING_SLOT_CREATE/pg_logical_decoding_slot_create()
* LOGICAL_DECODING_SLOT_STREAM/pg_logical_decoding_slot_extract()
* LOGICAL_DECODING_SLOT_DESTROY/pg_logical_decoding_slot_destroy()
with an intentional discrepancy between stream and extract, to make the
difference obvious. One day we might have the facility - which would be
rather cool - to do the streaming from sql as well.

Better ideas? Leave out the "logical"?

For b), I am happy with that split, I would just like others to comment.

For c), I have better idea than two functions.

d) is my main question, and Robert, Peter G. and I previously argued
about it a fair bit. I know of the following alternatives:

I) The output plugin that's specified in INIT_LOGICAL_REPLICATION is
actually a library name, and we simply lookup the fixed symbol names in
it. That's what currently implemented.
The advantage is that it's pretty easy to implement, works on a HS
standby without involving the primary, and doesn't have a problem if the
library is used in shared_preload_library.
The disadvantages are: All output plugins need to be shared libraries
and there can only be one output plugin per shared library (although you
could route differently, via options, but ugh).

II) Keep the output plugin a library, but only lookup a
_PG_init_output_plugin() which registers/returns the callbacks. Pretty
much the same tradeoffs as I)

III) Keep the output plugin a library, but simply rely on _PG_init()
calling a function to register all callbacks. Imo it's worse than I) and
II) because it basically prohibits using the library in
shared_preload_libraries as well, because then it's _PG_init() doesn't
get called when starting to stream, and another library might have
registered other callbacks.

IV) Make output plugins a SQL-level object/catalog table where a plugin
can be registered, and the callbacks are normal pg_proc entries. It's
more in line with other stuff, but has the disadvantage that we need to
register plugins on the primary, even if we only stream from a
standby. But then, we're used to that with CREATE EXTENSION et al.

I personally lean towards I), followed by II) and IV).

Comments?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Fetter 2013-12-05 00:28:36 Re: Status of FDW pushdowns
Previous Message Tom Lane 2013-12-05 00:04:18 Re: Performance optimization of btree binary search