Re: Re: pglogical_output - a general purpose logical decoding output plugin

From: Tomasz Rybak <tomasz(dot)rybak(at)post(dot)pl>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: pglogical_output - a general purpose logical decoding output plugin
Date: 2016-01-24 14:33:22
Message-ID: 1453646002.2166.30.camel@post.pl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I'm merging all your emails for sake of easier discussion.
I also cut all fragments that do not require response.

W dniu 22.01.2016, pią o godzinie 11∶06 +0800, użytkownik Craig Ringer
napisał:
> > We might also think about changing name of plugin to something
> > resembling "logical_streaming_decoder" or even "logical_streamer"
> > 
> I'm open to ideas there but I'd want some degree of consensus before
> undertaking the changes required. 

I know that it'd require much changes (both in this and pglogical 
plugin), and thus don't want to press for name change.
On one hand - changing of name might be good to avoid tight mental
coupling between pglogical_output and pglogica. At the same time - it's
much work, and I cannot think of any short and nice name, so 
pglogical_output might stay IMO.

 
> >  + subset of that database may be selected for replication,
> > currently based on
> > + table and on replication origin. Filtering by a WHERE clause can
> > be supported
> > + easily in future.
> >
> > Is this filtering by table and replication origin implemented? I
> > haven't
> > noticed it in source.
> That's what the hooks are for.
>

Current documentation suggests that replicating only selected is 
already available:
+ A subset of that database may be selected for replication, currently
+ based on table and on replication origin.

"currently based on table and on replication origin" means to me that
current state of plugin allows for just chosing which tables
to replicate. I'd see something like:

"A subset of that database might be selected for replication, e.g.
only chosen tables or changes from particular origin, in custom hook"

to convey that user needs to provide hook for filtering.

>  
> > + [the `CREATE_REPLICATION_SLOT ... LOGICAL ...` or
> > `START_REPLICATION SLOT ... LOGICAL ...` commands](http://www.postg
> > resql.org/docs/current/static/logicaldecoding-walsender.html) to
> > start streaming changes. (It can also be used via
> > + [SQL level functions](http://www.postgresql.org/docs/current/stat
> > ic/logicaldecoding-sql.html)
> > + over a non-replication connection, but this is mainly for
> > debugging purposes)
> >
> > Replication slot can also be configured (causing output plugin to
> > be loaded) via [SQL level functions]...
> Covered in the next section. Or at least it is in the SGML docs
> conversion I'm still trying to finish off..

OK, then I'll wait for the final version to review that.

>  
> > + * Client issues `CREATE_REPLICATION_SLOT slotname LOGICAL
> > 'pglogical'` if it's setting up for the first time
> >
> > * Client issues `CREATE_REPLICATION_SLOT slotname LOGICAL
> > 'pglogical'` to setup replication if it's connecting for the first
> > time
> I disagree. It's entirely possible to do your slot creation/setup
> manually or via something else, re-use a slot first created by
> another node, etc. Slot creation is part of client setup, not so much
> connection.

I'd propose then:
' * Client issues "CREATE_REPLICATION_SLOT ..." if the replication
was not configured earler, e.g. during previous connection, or manually
via [SQL functions | link to documentation]"

> > + If your application creates its own slots on first use and hasn't
> > previously
> > + connected to this database on this system you'll need to create a
> > replication
> > + slot. This keeps track of the client's replay state even while
> > it's disconnected.
> >
> > If your application hasn't previously connected to this database on
> > this system
> > it'll need to create and configure replication slot which keeps
> > track of the
> > client's replay state even while it's disconnected.
> As above, I don't quite agree.
>  

"If your application hasn't previously connedted to this database on
this system, and the replication slot was not configured through other
means (e.g. manually using [SQL functions | URL ] then you'll need
to create and configure replication slot ..."

> > 
> >  DESIGN.md:
> >
> > + attnos don't necessarily correspond. The column names might, and
> > their ordering
> > + might even be the same, but any column drop or column type change
> > will result
> >
> > The column names and their ordering might even be the same...
> I disagree, that has a different meaning. It's also not really user-
> facing docs so I'm not too worried about being quite as readable.
>

I do not try to change meaning but fix grammar. I'd like to have here
either more or less commas. So either:

+ The column names (and their ordering) might ...

or:

+ The column names, and their ordering, might ...

> > Is it true (no way to access change data)? You added passing change
> > to C hooks; from looking at code it looks like it's true, but I
> > want to be sure.
> While the change data is now passed to the C hook, there's no attempt
> to expose it via PL/PgSQL. So yeah, that's still true.
>

Thanks for confirming.

Speaking about flags - in most cases they are 0; only for
> > attributes
> > we might have: 
> >  +                       flags |= IS_REPLICA_IDENTITY;
> > I assume that flags field is put into protocol for future needs?
> Correct. The protocol specifies (in most places; need to double check
> all sites) that the lower 4 bits are reserved and must be treated as
> an ERROR if set. The high 4 bits must be ignored if set and not
> recognised. That gives us some room to wiggle without bumping the
> protocol version incompatibly, and lets us use capabilities (via
> client-supplied parameters) to add extra information on the wire.

Thanks for explanation. You're right - protocol.txt explains that
quite nicely.

+       /* FIXME support whole tuple (O tuple type) */
> > +       if (oldtuple != NULL)
> > +       {
> > +               pq_sendbyte(out, 'K');  /* old key follows */
> > +               pglogical_write_tuple(out, data, rel, oldtuple);
> > +       }
> > I don't fully understand. We are sending whole old tuple here,
> > so this FIXME should be more about supporting sending just keys.
> > But then comment "old key follows" is not true. Or am I missing
> > something here?
> In wal_level=logical the tuple that's written is an abbreviated tuple
> containing data only for the REPLICA IDENTITY fields.

I didn't know that - thanks for explanation.

 
> > +       pq_sendbyte(out, 'S');  /* message type field */
> > +       pq_sendbyte(out, 1);    /* startup message version */
> > For now protocol is 1, but for code readability it might be better
> > to change this line to:
> > +       pq_sendbyte(out, PGLOGICAL_PROTO_VERSION_NUM);  /* startup
> > message version */
> The startup message format isn't the same as the protocol version.
> Hopefully we'll never have to change it. The reason it's specified is
> so that if we ever do bump it a decoding plugin can recognise an old
> client and fall back. Maybe it's BC overkill but I'd kick myself for
> not doing it if we ever decided to (say) add support for structured
> json startup options from the client. Working on BDR has taught me
> that there's no such thing as too much consideration for cross-
> version compat and negotiation in replication.

Then I'd suggest adding named constant for this - so startup message
version is not mistaken for protocol version. Something like:

+ #define PGLOGICA_STARTUP_MESSAGE_VERSION_NUM 1

This way it is obvious that "1" and "1" do not mean that same.

Just for the sake of avoiding code repetition:
> > +       for (i = 0; i < desc->natts; i++)
> > +       {
> > +               if (desc->attrs[i]->attisdropped)
> > +                       continue;
> > +               nliveatts++;
> > +       }
> > +       pq_sendint(out, nliveatts, 2); 
> >  
> >  The exact same code is in write_tuple and write_attrs. I don't
> > know what's
> > policy for refactoring, but this might be extracted into separate
> > function.
> Seems trivial enough not to care, but probably can.

IMO such code (computing number of live attributes in relation) should
be used often enough to deserve own function - isn't such function
available in PostgreSQL?

> > I really don't like this. We have function parameter "data" and
> > now
> > are creating new variable with the same name.
> I agree. Good catch.

> I don't much like the use of 'data' as the param name for the plugin
> private data and am quite inclined to change that instead, to
> plugin_private or something.

Maybe rename function parameter to plugin_data or pluging_private_data?

+       /* Find cached function info, creating if not found */
> > +       hentry = (struct PGLRelMetaCacheEntry*)
> > hash_search(RelMetaCache,
> > +                                                                 
> >               (void *)(&RelationGetRelid(rel)),
> > +                                                                 
> >               HASH_ENTER, &found);
> > +
> > +       if (!found)
> > +       {
> > +               Assert(hentry->relid = RelationGetRelid(rel));
> > +               hentry->is_cached = false;
> > +               hentry->api_private = NULL;
> > +       }
> > +
> > +       Assert(hentry != NULL); 
> >  
> >  Shouldn't Assert be just after calling hash_search? We're (if
> > !found)
> > dereferencing hentry and only after checking whether it's not NULL.
> Yeah. It's more of an exit condition, stating "this function never
> returns non-null hentry" but moving it up is fine. Can do.

Please do so. It's more about conveying intent and increaing code
readability - so anyone knows that hentry should not be NULL
when returned by hash_search, and not some time later.

> Thanks again for this. Sorry it's taken so long to get the SGML docs
> converted. Too many other things on. 

No problem, I know how it is.

Best regards.

-- 
Tomasz Rybak GPG/PGP key ID: 2AD5 9860
Fingerprint A481 824E 7DD3 9C0E C40A 488E C654 FB33 2AD5 9860
http://member.acm.org/~tomaszrybak

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Stark 2016-01-24 14:59:12 Re: Releasing in September
Previous Message Alexander Korotkov 2016-01-24 11:12:26 Re: PoC: Partial sort