Re: pglogical_output - a general purpose logical decoding output plugin

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pglogical_output - a general purpose logical decoding output plugin
Date: 2015-11-02 13:10:27
Message-ID: CAMsr+YFq=GtG9L-Boz9C07P56TSY3VYABd6Kin1H0RFmnwBx9A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2 November 2015 at 20:17, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
> Hi all
>
> I'd like to submit pglogical_output for inclusion in the 9.6 series as
> a contrib.

A few points are likely to come up in anything but the most cursory
examination of the patch.

The README alludes to protocol docs that aren't in the tree. A
followup will add them shortly, they just need a few tweaks.

There are pg_regress tests, but they're limited. The output plugin
uses the binary output mode, and pg_regress doesn't play well with
that at all. Timestamps, XIDs, LSNs, etc are embedded in the output.
Also, pglogical its self emits LSNs and timestamps in commit messages.
Some things, like the startup message, are likely to contain variable
data in future too. So we can't easily do a "dumb" comparison of
expected output.

That's why the bulk of the tests in test/ are in Python, using
psycopg2. Python and psycopg2 were used partly because of the
excellent work done by Oleksandr Shulgin at Zalando
(https://github.com/zalando/psycopg2/tree/feature/replication-protocol,
https://github.com/psycopg/psycopg2/pull/322) which means we can
connect to the walsender and consume the replication protocol, rather
than relying only on the SQL interfaces. Both are supported, and only
the SQL interface is used by default. It also means the tests can have
logic to validate the protocol stream, examining it message by message
to ensure it's exactly what's expected. Rather than a diff where two
lines of binary gibberish don't match, you get a specific error. Of
course, I'm aware that the buildfarm animals aren't required to have
Python, let alone a patched psycopg2, so we can't rely on these as
smoketests. That's why the pg_regress tests are there too.

There another extension inside it, in
contrib/pglogical_output/examples/hooks . I'm not sure if this should
be separated out into a separate contrib/ since it's very tightly
coupled to pglogical_output. Its purpose is to expose the hooks from
pglogical_output to SQL, so that they can be implemented by plpgsql or
whatever, instead of having to be C functions. It's not integrated
into pglogical_output proper because I see this as mainly a test and
prototyping facility. It's necessary to have this in order for the
unit tests to cover filtering and hooks, but most practical users will
never want or need it. So I'd rather not integrate it into
pglogical_output proper.

pglogical_output has headers, and it installs them into Pg's include/
tree at install time. This is not something that prior contribs have
done, so there's no policy for it as yet. The reason for doing so is
that the output plugin exposes a hooks API so that it can be reused by
different clients with different needs, rather than being tightly
coupled to just one downstream user. For example, it makes no
assumptions about things like what replication origin names mean -
keeping with the design of replication origins, which just provide
mechanism without policy. That means that the client needs to tell the
output plugin how to filter transactions if it wants to do selective
replication on a node-by-node basis. Similarly, there's no built-in
support for selective replication on a per-table basis, just a hook
you can implement. So clients can provide their own policy for how to
decide what tables to replicate. When we're calling hooks for each and
every row we really want a C function pointer so we can avoid the need
to go through the fmgr each time, and so we can pass a `struct
Relation` and bypass the need for catalog lookups. That sort of thing.

Table metadata is sent for each row. It really needs to be sent once
for each consecutive series of rows for the same table. Some care is
required to make sure it's invalidated and re-sent when the table
structure changes mid-series. So that's a pending change. It's
important for efficiency, but pretty isolated and doesn't make the
plugin less useful otherwise, so I thought it could wait.

Sending the whole old tuple is not yet supported, per the fixme in
pglogical_write_update . It should really be a TODO, since to support
this we really need a way to keep track of replica identity for a
table, but also WAL-log the whole old tuple. (ab)using REPLICA
IDENTITY FULL to log the old tuple means we lose information about
what the real identity key is. So this is more of a wanted future
feature, and I'll change it to a TODO.

I'd like to delay some ERROR messages until after the startup
parameters are sent. That way the client can see more info about the
server's configuration, version, capabilities, etc, and possibly
reconnect with acceptable settings. Because a logical decoding plugin
isn't allowed to generate input during its startup callback, though,
this could mean indefinitely delaying an error until the upstream does
some work that results in a decoding callback. So for now errors on
protocol mismatches, etc, are sent immediately.

Text encoding names are compared byte-wise. They should be looked up
in the catalogs and compared properly. Just not done yet.

I think those are the main points.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2015-11-02 13:32:49 Re: remove wal_level archive
Previous Message Michael Paquier 2015-11-02 13:04:50 Re: BUG #13741: vacuumdb does not accept valid password