Re: Re: pglogical_output - a general purpose logical decoding output plugin

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Tomasz Rybak <tomasz(dot)rybak(at)post(dot)pl>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: pglogical_output - a general purpose logical decoding output plugin
Date: 2016-01-21 01:04:31
Message-ID: CAMsr+YGq7mqy+yxpJ=xQf=iBf2ynQ-kD75--mGBsT6tLD7bgWQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 21 January 2016 at 06:23, Tomasz Rybak <tomasz(dot)rybak(at)post(dot)pl> wrote:

>
> I reviewed more files:

Thanks.

Can you try to put more whitespace between items? It can be hard to follow
at the moment.

> pglogical_proto_native.c
>
> + pq_sendbyte(out, 'N'); /* column name block
> follows */
> + attname = NameStr(att->attname);
> + len = strlen(attname) + 1;
> + pq_sendint(out, len, 2);
> + pq_sendbytes(out, attname, len); /* data */
> Identifier names are limited to 63 - so why we're sending 2 bytes here?
>

Good question. It should be one byte. I'll need to amend the protocol for
that, but I don't think that's a problem this early on.

> + pq_sendbyte(out, 'B'); /* BEGIN */
> +
> + /* send the flags field its self */
> + pq_sendbyte(out, flags);
> Comment: "flags field its self"? Shouldn't be "... itself"?
> Similarly in write_origin; write_insert just says:
>

itself is an abbreviation of its self.

> + /* send the flags field */
> + pq_sendbyte(out, flags);
>
> Speaking about flags - in most cases they are 0; only for attributes
> we might have:

+ flags |= IS_REPLICA_IDENTITY;
> I assume that flags field is put into protocol for future needs?
>

Correct. The protocol specifies (in most places; need to double check all
sites) that the lower 4 bits are reserved and must be treated as an ERROR
if set. The high 4 bits must be ignored if set and not recognised. That
gives us some room to wiggle without bumping the protocol version
incompatibly, and lets us use capabilities (via client-supplied parameters)
to add extra information on the wire.

+ /* FIXME support whole tuple (O tuple type) */
> + if (oldtuple != NULL)
> + {
> + pq_sendbyte(out, 'K'); /* old key follows */
> + pglogical_write_tuple(out, data, rel, oldtuple);
> + }
> I don't fully understand. We are sending whole old tuple here,
> so this FIXME should be more about supporting sending just keys.
> But then comment "old key follows" is not true. Or am I missing
> something here?
>

In wal_level=logical the tuple that's written is an abbreviated tuple
containing data only for the REPLICA IDENTITY fields.

Ideally we'd also be able to support sending the _whole_ old tuple, but
this would require the ability to log the whole old tuple in WAL when
logging a DELETE or UPDATE into WAL. This isn't so much a FIXME as a
logical decoding limitation and wishlist item; I'll amend to that effect.

>
> + pq_sendbyte(out, 'S'); /* message type field */
> + pq_sendbyte(out, 1); /* startup message version */
> For now protocol is 1, but for code readability it might be better
> to change this line to:
> + pq_sendbyte(out, PGLOGICAL_PROTO_VERSION_NUM); /* startup message
> version */
>

The startup message format isn't the same as the protocol version.
Hopefully we'll never have to change it. The reason it's specified is so
that if we ever do bump it a decoding plugin can recognise an old client
and fall back. Maybe it's BC overkill but I'd kick myself for not doing it
if we ever decided to (say) add support for structured json startup options
from the client. Working on BDR has taught me that there's no such thing as
too much consideration for cross-version compat and negotiation in
replication.

I'm happy to create a new define for that an comment to this effect.

> Just for the sake of avoiding code repetition:
> + for (i = 0; i < desc->natts; i++)
> + {
> + if (desc->attrs[i]->attisdropped)
> + continue;
> + nliveatts++;
> + }
> + pq_sendint(out, nliveatts, 2);

The exact same code is in write_tuple and write_attrs. I don't know what's
> policy for refactoring, but this might be extracted into separate function.
>

Seems trivial enough not to care, but probably can.

>
> + else if (att->attlen == -1)
> + {
> + char *data =
> DatumGetPointer(values[i]);
> +
> + /* send indirect datums inline */
> + if
> (VARATT_IS_EXTERNAL_INDIRECT(values[i]))
> + {
> + struct varatt_indirect
> redirect;
> +
> VARATT_EXTERNAL_GET_POINTER(redirect, data);
> + data = (char *)
> redirect.pointer;
> + }

I really don't like this. We have function parameter "data" and now
> are creating new variable with the same name.

I agree. Good catch.

I don't much like the use of 'data' as the param name for the plugin
private data and am quite inclined to change that instead, to
plugin_private or something.

pglogical_proto_json.c
>
> + appendStringInfo(out, ", \"origin_lsn\":\"%X/%X\"",
> + (uint32)(txn->origin_lsn >> 32),
> (uint32)(txn->origin_lsn));
> I remember there was discussion on *-hackers recently about %X/%X; I'll
> try to find it and check whether it's according to final conclusion.
>

Thanks.

> pglogical_relmetacache.c
>
> First. In pglogical_output.c, in pg_decode_startup we are calling
> init_relmetacache. I haven't found call to destroy_relmetacache
> and comment says that it must be called at backend shutdown.
> Is it guaranteed? Or will cache get freed with its context?
>

Hm, ok. It must never be called before shutdown, but it's not necessary to
call at all. I'll amend it. It's present just in case future Valgrind
support wants to call it to explicitly clean up memory.

I'll take a closer look there. I think I changed the lifetime of the
relmetacache after figuring out a better way to handle the cache
invalidations and it's possible I missed an update in the comments.

> + /* Find cached function info, creating if not found */
> + hentry = (struct PGLRelMetaCacheEntry*) hash_search(RelMetaCache,
> +
> (void *)(&RelationGetRelid(rel)),
> +
> HASH_ENTER, &found);
> +
> + if (!found)
> + {
> + Assert(hentry->relid = RelationGetRelid(rel));
> + hentry->is_cached = false;
> + hentry->api_private = NULL;
> + }
> +
> + Assert(hentry != NULL);

Shouldn't Assert be just after calling hash_search? We're (if !found)
> dereferencing hentry and only after checking whether it's not NULL.
>

Yeah. It's more of an exit condition, stating "this function never returns
non-null hentry" but moving it up is fine. Can do.

> I haven't found relevance of relmeta_cache_size attribute.
> It's set to value coming from client - but then the only thing that matters
> is whether it is 0 or not. I'll try to reread DESIGN.md session about
> cache,
> but for now (quite late and I'm quite tired) it is not clear on the first
> reading.

It's the first part of a feature. You can turn the relation metadata cache
off, set it to unlimited, or (in future, not implemented yet) set a bounded
size where a LRU is used to evict the oldest entry.

The LRU approach is more complex since we have to track the LRU and do
evictions as well as the cache map its self. Efficiently. I expect this to
be a 1.1 or later feature. Having the param as an int now means we don't
later land up having to have two params, one to enable the cache and
another to bound its size, so once it's fully implemented it'll (IMO) be
clearer what's going on.

Thanks again for the detailed code examination. I find it hard to read
familiar code closely for review since I see what I expect to see. So it's
really, really useful.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2016-01-21 01:20:24 pgsql: Refactor to create generic WAL page read callback
Previous Message Tom Lane 2016-01-21 00:39:37 Re: Expanded Objects and Order By