Re: logical changeset generation v6

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Peter Geoghegan <pg(at)heroku(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: logical changeset generation v6
Date: 2013-09-24 15:04:06
Message-ID: CA+TgmoZ2ow-ZDHv4NU5Eu=0fMjYGjtgQfcJ3r-CKdCX4uBwp+w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 24, 2013 at 4:15 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> There needs to be a client acking the reception of the data in some
> form. There's currently two output methods, SQL and walstreamer, but
> there easily could be further, it's basically two functions you have
> write.
>
> There are several reasons I think the tool is useful, starting with the
> fact that it makes the initial use of the feature easier. Writing a
> client for CopyBoth messages wrapping 'w' style binary messages, with the
> correct select() loop isn't exactly trivial. I also think it's actually
> useful in "real" scenarios where you want to ship the data to a
> remote system for auditing purposes.

I have two basic points here:

- Requiring a client is a short-sighted design. There's no reason we
shouldn't *support* having a client, but IMHO it shouldn't be the only
way to use the feature.

- Suppose that you use pg_receivellog (or whatever we decide to call
it) to suck down logical replication messages. What exactly are you
going to do with that data once you've got it? In the case of
pg_receivexlog it's quite obvious what you will do with the received
files: you'll store them in archive of some kind and maybe eventually
use them for archive recovery, streaming replication, or PITR. But
the answer here is a lot less obvious, at least to me.

>> For example, for replication, I'd think you might want the
>> plugin to connect to a remote database and directly shove the data in;
>
> That sounds like a bad idea to me. If you pull the data from the remote
> side, you get the data in a streaming fashion and the latency sensitive
> part of issuing statements to your local database is done locally.
> Doing things synchronously like that also makes it way harder to use
> synchronous_commit = off on the remote side, which is a tremendous
> efficiency win.

This sounds like the voice of experience talking, so I won't argue too
much, but I don't think it's central to my point. And anyhow, even if
it is a bad idea, that doesn't mean someone won't want to do it. :-)

> If somebody needs something like this, e.g. because they want to
> replicate into hundreds of shards depending on some key or such, the
> question I don't know is how to actually initiate the
> streaming. Somebody would need to start the logical decoding.

Sounds like a job for a background worker. It would be pretty swell
if you could write a background worker that connects to a logical
replication slot and then does whatever.

>> for materialized views, we might like to push the changes into delta
>> relations within the source database.
>
> Yes, that's not a bad usecase and I think the only thing missing to use
> output plugins that way is a convenient function to tell up to where
> data has been received (aka synced to disk, aka applied).

Yes. It feels to me (and I only work here) like the job of the output
plugin ought to be to put the data somewhere, and the replication code
shouldn't make too many assumptions about where it's actually going.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2013-09-24 15:06:23 Re: trivial one-off memory leak in guc-file.l ParseConfigFile
Previous Message Robert Haas 2013-09-24 14:51:14 Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE - visibility semantics