Re: pglogical_output - a general purpose logical decoding output plugin

From: Tomasz Rybak <tomasz(dot)rybak(at)post(dot)pl>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pglogical_output - a general purpose logical decoding output plugin
Date: 2016-01-20 22:23:35
Message-ID: 20160120222335.21436.18864.pgcf@coridan.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

The following review has been posted through the commitfest application:
make installcheck-world: not tested
Implements feature: not tested
Spec compliant: not tested
Documentation: not tested

I revied more files:

pglogical_proto_native.c

+ pq_sendbyte(out, 'N'); /* column name block follows */
+ attname = NameStr(att->attname);
+ len = strlen(attname) + 1;
+ pq_sendint(out, len, 2);
+ pq_sendbytes(out, attname, len); /* data */
Identifier names are limited to 63 - so why we're sending 2 bytes here?
I do not have hard feelings about this - just curiousity. For:
+ pq_sendbyte(out, nspnamelen); /* schema name length */
+ pq_sendbytes(out, nspname, nspnamelen);

+ pq_sendbyte(out, relnamelen); /* table name length */
+ pq_sendbytes(out, relname, relnamelen);
schema and relation name we send 1 byte, for attribute 2. Strange, bit inconsistent.

+ pq_sendbyte(out, 'B'); /* BEGIN */
+
+ /* send the flags field its self */
+ pq_sendbyte(out, flags);
Comment: "flags field its self"? Shouldn't be "... itself"?
Similarly in write_origin; write_insert just says:
+ /* send the flags field */
+ pq_sendbyte(out, flags);

Speaking about flags - in most cases they are 0; only for attributes
we might have:
+ flags |= IS_REPLICA_IDENTITY;
I assume that flags field is put into protocol for future needs?

+ /* FIXME support whole tuple (O tuple type) */
+ if (oldtuple != NULL)
+ {
+ pq_sendbyte(out, 'K'); /* old key follows */
+ pglogical_write_tuple(out, data, rel, oldtuple);
+ }
I don't fully understand. We are sending whole old tuple here,
so this FIXME should be more about supporting sending just keys.
But then comment "old key follows" is not true. Or am I missing
something here?
Similarly for write_delete.

+ pq_sendbyte(out, 'S'); /* message type field */
+ pq_sendbyte(out, 1); /* startup message version */
For now protocol is 1, but for code readability it might be better
to change this line to:
+ pq_sendbyte(out, PGLOGICAL_PROTO_VERSION_NUM); /* startup message version */

Just for the sake of avoiding code repetition:
+ for (i = 0; i < desc->natts; i++)
+ {
+ if (desc->attrs[i]->attisdropped)
+ continue;
+ nliveatts++;
+ }
+ pq_sendint(out, nliveatts, 2);
The exact same code is in write_tuple and write_attrs. I don't know what's
policy for refactoring, but this might be extracted into separate function.

+ else if (att->attlen == -1)
+ {
+ char *data = DatumGetPointer(values[i]);
+
+ /* send indirect datums inline */
+ if (VARATT_IS_EXTERNAL_INDIRECT(values[i]))
+ {
+ struct varatt_indirect redirect;
+ VARATT_EXTERNAL_GET_POINTER(redirect, data);
+ data = (char *) redirect.pointer;
+ }
I really don't like this. We have function parameter "data" and now
are creating new variable with the same name. It might lead to confusion
and some long debugging sessions. Please change the name of this variable.
Maybe attr_data would be OK? Or outputbytes, like below, for case 'b' and default?

pglogical_proto_json.c

+ appendStringInfo(out, ", \"origin_lsn\":\"%X/%X\"",
+ (uint32)(txn->origin_lsn >> 32), (uint32)(txn->origin_lsn));
I remember there was discussion on *-hackers recently about %X/%X; I'll
try to find it and check whether it's according to final conclusion.

pglogical_relmetacache.c

First. In pglogical_output.c, in pg_decode_startup we are calling
init_relmetacache. I haven't found call to destroy_relmetacache
and comment says that it must be called at backend shutdown.
Is it guaranteed? Or will cache get freed with its context?

+ /* Find cached function info, creating if not found */
+ hentry = (struct PGLRelMetaCacheEntry*) hash_search(RelMetaCache,
+ (void *)(&RelationGetRelid(rel)),
+ HASH_ENTER, &found);
+
+ if (!found)
+ {
+ Assert(hentry->relid = RelationGetRelid(rel));
+ hentry->is_cached = false;
+ hentry->api_private = NULL;
+ }
+
+ Assert(hentry != NULL);
Shouldn't Assert be just after calling hash_search? We're (if !found)
dereferencing hentry and only after checking whether it's not NULL.

I haven't found relevance of relmeta_cache_size attribute.
It's set to value coming from client - but then the only thing that matters
is whether it is 0 or not. I'll try to reread DESIGN.md session about cache,
but for now (quite late and I'm quite tired) it is not clear on the first reading.

Best regards.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2016-01-20 22:29:23 Re: Releasing in September
Previous Message Tom Lane 2016-01-20 22:09:32 Re: More stable query plans via more predictable column statistics