Re: logical changeset generation v6.7

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: andres(at)2ndquadrant(dot)com
Cc: robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: logical changeset generation v6.7
Date: 2013-12-05 13:03:51
Message-ID: 20131205.220351.189609364.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

> Will send the rebased version as soon as I've addressed your comments.

Thank you.

> > ===== 0001:
> >
> > - You assined HeapTupleGetOid(tuple) into relid to read in
> > several points but no modification. Nevertheless, you left
> > HeapTupleGetOid not replaced there. I think 'relid' just for
> > repeated reading has far small merit compared to demerit of
> > lowering readability. You'd be better to make them uniform in
> > either way.
>
> It's primarily to get the line lengths halfway under control.

Mm. I'm afraid I couldn't catch your words, do you mean that
IsSystemClass() or isTempNamespace() could change the NULL bitmap
in the tuple?

> > ===== 0002:
> >
> > - You are identifying the wal_level with the expr 'wal_level >=
> > WAL_LEVEL_LOGICAL' but it seems somewhat improper.
>
> Hm. Why?

It actually does no harm and somewhat trifling so I don't assert
you should fix it.

The reason for the comment is the greater values for wal_level
are undefined at the moment, so strictly saying, such values
should be handled as invalid ones. Although there is a practice
to avoid loop overruns by comparing counters with the expression
like (i > CEILING).

For instance, I found a macro for which comment reads as follows
and I feel a bit uneasy with it :-) It's nothing more than that.

| /* Do we need to WAL-log information required only for Hot Standby? */
~~~~
| #define XLogStandbyInfoActive() (wal_level >= WAL_LEVEL_HOT_STANDBY)

> > - RelationIsAccessibleInLogicalDecoding and
> > RelationIsLogicallyLogged are identical in code. Together with
> > the name ambiguity, this is quite confising and cause of
> > future misuse between these macros, I suppose. Plus the names
> > seem too long.
>
> Hm, don't think they are equivalent, rather the contrary. Note one
> returns false if IsCatalogRelation(), the other true.

Oops, I'm sorry. I understand they are not same. Then I have
other questions. The name for the first one
'RelationIsAccessibleInLogicalDecoding' doesn't seem representing
what its comment reads.

| /* True if we need to log enough information to have access via
| decoding snapshot. */

Making the macro name for this comment directly, I suppose it
would be something like 'NeedsAdditionalInfoInLogicalDecoding' or
more directly 'LogicalDeodingNeedsCids' or so..

> > - In heap_insert, the information conveyed on rdata[3] seems to
> > be better to be in rdata[2] because of the scarecity of the
> > additional information. XLOG_HEAP_CONTAINS_NEW_TUPLE only
> > seems to be needed. Also is in heap_multi_insert and
> > heap_upate.
>
> Could you explain a bit more what you mean by that? The reason it's a
> separate rdata entry is that otherwise a full page write will remove the
> information.

Sorry, I missed the comment 'so that an eventual FPW doesn't
remove the tuple's data'. Although given the necessity of removal
prevention, rewriting rdata[].buffer which is required by design
(correct?) with InvalidBuffer seems a bit outrageous for me and
obfuscating the objective of it. Other mechanism should be
preferable, I suppose. The most straight way to do that should be
new flag bit for XLogRecData, say, survive_fpw or something.

> > - In heap_multi_insert, need_cids referred only once so might be
> > better removed.
>
> It's accessed in a loop over potentially quite some items, that's why I
> moved it into an extra variable.

Sorry bothering you with comments biside the point.. But the
scope of needs_cids is narrower than it is. I think the
definition should be moved into the block for 'if (needwal)'.

> > - In heap_delete, at the point adding replica identity, same to
> > the aboves, rdata[3] could be moved into rdata[2] making new
> > type like 'xl_heap_replident'.
>
> Hm. I don't think that'd be a good idea, because we'd then need special
> case decoding code for deletes because the wal format would be different
> for inserts/updates and deletes.

Hmm. Although one common xl_heap_replident is impractical,
splitting a logcally single entity into two or more XLogRecDatas
also seems not to be a good idea.

> > - In heapam_xlog.h, the new type xl_heap_header_len is
> > inadequate in both of naming which is confising and
> > construction on which the header in xl_heap_header is no
> > longer be a header and indecisive member name 't_len'..
>
> The "header" bit in the name refers to the fact that it's containing
> information about the a HeapTuple's header, not that it's a header
> itself. Do you have a better suggestion than xl_heap_header_len?

Sorry, I'm confused during writing the comment, The order of
members in xl_heap_header_len doesn't matter. I got the reason
for the xl_header_len and whole xlog record image after
re-reading the relevant code. The update record became to contain
two variable length data by this patch. So the length of the
tuple body cannot be calculated only with whole record length and
header lengths.

Considering above, looking heap_xlog_insert(), you marked on
xlrec.flags with XLOG_HEAP_CONTAINS_NEW_TUPLE to signal decoder
that the record should have tuple data not being removed by fpw.
This is the same for the update record. So the redoer(?) also can
distinguish whether the update record contains extra tuple data
or not.

On the other hand, the update record after patched is longer by
sizeof(uint16) regardless of whether 'tuple data' is attached or
not. I don't know the value of the size in WAL stream, but the
smaller would be better maybe.

As a conclusion, I thing it would be better to decide whether to
insert length SEGMENT before the tuple data segment in
log_heap_update. Like follows,

| rdata[1].next = &(rdata[2]);
|
| xlhdr.t_infomask2 = newtup->t_data->t_infomask2;
| xlhdr.t_infomask = newtup->t_data->t_infomask;
| xlhdr.t_hoff = newtup->t_data->t_hoff;
|
| /*...*/
| rdata[2].data = (char *) &xlhdr;
| ...
| rdata[2].next = &(rdata[3]);
|
| if (need_tuple_data)
| {
| uint16 newtupbodylen =
| newtup->t_len - offsetof(HeapTupleHeaderData, t_bits);
| rdata[3].data = &newtupbodylen;
| ....
| }
| else
| {
| rdata[3].data = NULL;
| rdata[3].len = 0;
| ...
| }
| rdata[3].next = &(rdata[4]);
|
| /* PG73FGORMAT: write bitmap [+ padding] [+ oid] + data */
| rdata[4].data = (char *) newtup->t_data;

> > - In heapam_xlog.h, XLOG_HEAP_CONTAINS_OLD looks incomplete. And
> > it seems to be used in nowhere in this patchset. It should be
> > removed.
>
> Not sure what you mean with incomplete? It contains the both possible
> variants for an old contained tuple. The macro is used in the decoding,
> but I don't think things get clearer if we revise the macros in that
> later patch.

Umm. I don't understand why I missed where it is used, it surely
used in decode.c as you mentioned. Ok, this is required. Then
about 'imcomplete', it means 'CONTAINS OLD what?'...mmm,
logically speaking, lack of an object for the word 'OLD'. The
answer for the question should be 'both KEY and TUPLE'. Comparing
the revised images,

# Of course, it doesn't matter if the object for OLD can
# naturally be missing as English. I'm not a native English
# speaker as you know:-)

defining XLOG_HEAP_CONTAINS_OLD_KEY_AND_TUPLE,
| if (xlrec->flags & XLOG_HEAP_CONTAINS_OLD_KEY_AND_TUPLE)
| {

undefining XLOG_HEAP_CONTAINS_OLD and use separte macros type 1
| if (xlrec->flags & XLOG_HEAP_CONTAINS_OLD_KEY ||
| xlrec->flags & XLOG_HEAP_CONTAINS_OLD_TUPLE)
| {
(I belive this should be optimized by the compiler:-)

and type 2
| if (xlrec->flags &
| (XLOG_HEAP_CONTAINS_OLD_KEY | XLOG_HEAP_CONTAINS_OLD_TUPLE))
| {

I'm ok with any of them or others. In this connection, I found
following phrase in heapam.c which like type2 above.

| if (!(old_infomask & (HEAP_XMAX_INVALID |
| HEAP_XMAX_COMMITTED |
| HEAP_XMAX_IS_MULTI)) &&

> > - log_heap_new_cid() is called at several part just before other
> > xlogs is being inserted. I suppose this should be built in the
> > target xlog structures.
>
> Proportionally it will only be logged in a absolute minority of the
> cases (since normally the catalog will only seldomly be updated in
> comparison to a user's tables), so it doesn't seem like a good idea to
> complicate the already *horribly* complicated wal format for heap_*.

Horribly:-) I agree to you. Hmm I reconsider with new
knowledge(!) about your patch. But what do you think doing this
as follows,

if (RelationIsAccessibleInLogicalDecoding(relation))
+ {
| rdata_heap_new_cid(&rdata[0], relation, heaptup);
+ xlrec.flags |= XLOG_HEAP_CONTAINS_NEW_CID;
+ }
+ else
+ rdata_void(&rdata[0])
+ rdata[0].next = &(rdata[1]);

xlrec.flags = all_visible_cleared ? XLOG_HEAP_ALL_VISIBLE_CLEARED : 0;
xlrec.target.node = relation->rd_node;
xlrec.target.tid = heaptup->t_self;
| rdata[1].data = (char *) &xlrec;

If you don't agree with this, I don't say no more about this.

> > - in RecovoerPreparedTransactions(), any commend needed for the
> > reason calling XLogLogicalInfoActive()..
>
> It's pretty much the "Test here must match one used in
> AssignTransactionId()" comment. We only want to allow overwriting if
> AssignTransactionId() might already have done the SubTransSetParent()
> calls.

Thank you, I found it and it seems to be sufficient.

> > - In xact.c, the comment for the member 'didLogXid' in
> > TransactionStateData seems differ from it's meaning. It
> > becomes true when any WAL record for the current transaction
> > id just has been written to WAL buffer. So the comment,
> >
> > > /* has xid been included in WAL record? */
> >
> > would be better be something like (Should need corrected as
> > I'm not native speaker.)
>
> > /* Any WAL record for this transaction has been emitted ? */
>
> I don't think that'd be an improvement, transaction is a bit ambigiuous
> there because it might be the toplevel or subtransaction.

Hmm. Ok, I agree with it.

> > and also the member name should be something like
> > XidIsLogged. (Not so chaned?)
>
> Hm.
>
> > - The name of the function MarkCurrentTransactionIdLoggedIfAny,
> > although irregular abbreviations are discouraged, seems too
> > long. Isn't MarkCur(r/rent)XidLoggedIfAny sufficient?
>
> If you look at the other names in xact.h that doesn't seem to fit too
> well in the naming pattern.

(Now looking...) Wow. Ok, I agree to you.

> > Anyway,
> > the work involving this function seems would be better to be
> > done in some other way..
>
> Why? How?

How... It is easy to comment but hard to realize. Ok, let's
forget this comment:-) The 'Why' came from some obscure
impression(?), without firm thought.

> > - The comment for RelationGetIndexAttrBitmap() should be edited
> > for attrKind.
>
> Good point.

Thanks.

> > - The macro name INDEX_ATTR_BITMAP_KEY should be
> > INDEX_ATTR_BITMAP_FKEY. And INDEX_ATTR_BITMAP_IDENTITY_KEY
> > should be INDEX_ATTR_BITMAP_REPLID_KEY or something.
>
> But INDEX_ATTR_BITMAP_KEY isn't just about foreign keys... But I agree
> that INDEX_ATTR_BITMAP_IDENTITY_KEY should be renamed.

Thank you, please remember to do that.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2013-12-05 13:29:10 Re: Performance optimization of btree binary search
Previous Message MauMau 2013-12-05 13:00:41 Re: [bug fix] pg_ctl fails with config-only directory