Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: Önder Kalacı <onderkalaci(at)gmail(dot)com>
Cc: "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Date: 2022-08-30 10:13:41
Message-ID: CAHut+Ps83rtYqowmkGXTt5Khke1eMEyEuejdjFxgHTBfz7wDVQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Onder,

Since you ask me several questions [1], this post is just for answering those.

I have looked again at the latest v9 patch, but I will post my review
comments for that separately.

On Thu, Aug 25, 2022 at 7:09 PM Önder Kalacı <onderkalaci(at)gmail(dot)com> wrote:
>
>> 1d.
>> In above text, what was meant by "catches up around ~5 seconds"?
>> e.g. Did it mean *improves* by ~5 seconds, or *takes* ~5 seconds?
>>
>
> It "takes" 5 seconds to replicate all the changes. To be specific, I execute 'SELECT sum(abalance) FROM pgbench_accounts' on the subscriber, and try to measure the time until when all the changes are replicated. I do use the same query on the publisher to check what the query result should be when replication is done.
>
> I updated the relevant text, does that look better?

Yes.

>> 2. GENERAL
>>
>> 2a.
>> There are lots of single-line comments that start lowercase, but by
>> convention, I think they should start uppercase.
>>
>> e.g. + /* we should always use at least one attribute for the index scan */
>> e.g. + /* we might not need this if the index is unique */
>> e.g. + /* avoid expensive equality check if index is unique */
>> e.g. + /* unrelated Path, skip */
>> e.g. + /* simple case, we already have an identity or pkey */
>> e.g. + /* indexscans are disabled, use seq. scan */
>> e.g. + /* target is a regular table */
>>
>> ~~
>
>
> Thanks for noting this, I didn't realize that there is a strict requirement on this. Updated all of your suggestions, and realized one more such case.
>
> Is there documentation where such conventions are listed? I couldn't find any.

I don’t know of any strict requirements, but I did think it was the
more common practice to make the comments look like proper sentences.
However, when I tried to prove that by counting the single-line
comments in PG code it seems to be split almost 50:50
lowercase/uppercase, so I guess you should just do whatever is most
sensible or is most consistent with the surrounding code ….

Counts for single line /* */ comments:
regex ^\s*\/\*\s[a-z]+.*\*\/$ = 18222 results
regex ^\s*\/\*\s[A-Z]+.*\*\/$ = 20252 results

>> 3. src/backend/executor/execReplication.c - build_replindex_scan_key
>>
>> - int attoff;
>> + int index_attoff;
>> + int scankey_attoff;
>> bool isnull;
>> Datum indclassDatum;
>> oidvector *opclass;
>> int2vector *indkey = &idxrel->rd_index->indkey;
>> - bool hasnulls = false;
>> -
>> - Assert(RelationGetReplicaIndex(rel) == RelationGetRelid(idxrel) ||
>> - RelationGetPrimaryKeyIndex(rel) == RelationGetRelid(idxrel));
>>
>> indclassDatum = SysCacheGetAttr(INDEXRELID, idxrel->rd_indextuple,
>> Anum_pg_index_indclass, &isnull);
>> Assert(!isnull);
>> opclass = (oidvector *) DatumGetPointer(indclassDatum);
>> + scankey_attoff = 0;
>>
>> Maybe just assign scankey_attoff = 0 at the declaration?
>>
>
> Again, lack of coding convention knowledge :/ My observation is that it is often not assigned during the declaration. But, changed this one.
>

I don’t know of any convention. Probably this is just my own
preference to keep the simple default assignments with the declaration
to reduce the LOC. YMMV.

>>
>> 6.
>>
>> - int pkattno = attoff + 1;
>> ...
>> /* Initialize the scankey. */
>> - ScanKeyInit(&skey[attoff],
>> - pkattno,
>> + ScanKeyInit(&skey[scankey_attoff],
>> + index_attoff + 1,
>> BTEqualStrategyNumber,
>> Wondering if it would have been simpler if you just did:
>> int pkattno = index_attoff + 1;
>
>
>
> The index is not necessarily the primary key at this point, that's why I removed it.
>
> There are already 3 variables in the same function index_attoff, scankey_attoff and table_attno, which are hard to avoid. But, this one seemed ok to avoid, mostly to simplify the readability. Do you think it is better with the additional variable? Still, I think we need a better name as "pk" is not relevant anymore.
>

Your code is fine. Leave it as-is.

>> 8. src/backend/executor/execReplication.c - RelationFindReplTupleByIndex
>>
>> @@ -128,28 +171,44 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
>> TransactionId xwait;
>> Relation idxrel;
>> bool found;
>> + TypeCacheEntry **eq;
>> + bool indisunique;
>> + int scankey_attoff;
>>
>> /* Open the index. */
>> idxrel = index_open(idxoid, RowExclusiveLock);
>> + indisunique = idxrel->rd_index->indisunique;
>> +
>> + /* we might not need this if the index is unique */
>> + eq = NULL;
>>
>> Maybe just default assign eq = NULL in the declaration?
>>
>
> Again, I wasn't sure if it is OK regarding the coding convention to assign during the declaration. Changed now.
>

Same as #3.

>> 10.
>>
>> + /* we only need to allocate once */
>> + if (eq == NULL)
>> + eq = palloc0(sizeof(*eq) * outslot->tts_tupleDescriptor->natts);
>>
>> But shouldn't you also free this 'eq' before the function returns, to
>> prevent leaking memory?
>>
>
> Two notes here. First, this is allocated inside ApplyMessageContext, which seems to be reset per tuple change. So, that seems like a good boundary to keep this allocation in memory.
>

OK, fair enough. Is it worth adding a comment to say that or not?

> Second, RelationFindReplTupleSeq() doesn't free the same allocation roughly at a very similar call stack. That's why I decided not to pfree. Do you see strong reason to pfree at this point? Then we should probably change that for RelationFindReplTupleSeq() as well.
>
>>
>> 23. src/backend/replication/logical/relation.c - LogicalRepUsableIndex
>>
>> +static Oid
>> +LogicalRepUsableIndex(Relation localrel, LogicalRepRelation *remoterel)
>> +{
>> + Oid idxoid;
>> +
>> + /*
>> + * We never need index oid for partitioned tables, always rely on leaf
>> + * partition's index.
>> + */
>> + if (localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
>> + return InvalidOid;
>> +
>> + /* simple case, we already have an identity or pkey */
>> + idxoid = GetRelationIdentityOrPK(localrel);
>> + if (OidIsValid(idxoid))
>> + return idxoid;
>> +
>> + /* indexscans are disabled, use seq. scan */
>> + if (!enable_indexscan)
>> + return InvalidOid;
>>
>> I thought the (!enable_indexscan) fast exit perhaps should be done
>> first, or at least before calling GetRelationIdentityOrPK.
>
>
> This is actually a point where I need some more feedback. On HEAD, even if the index scan is disabled, we use the index. For this one, (a) I didn't want to change the behavior for existing users (b) want to have a way to disable this feature, and enable_indexscan seems like a good one.
>
> Do you think I should dare to move it above GetRelationIdentityOrPK()? Or, maybe I just need more comments? I improved the comment, and it would be nice to hear your thoughts on this.

I agree with you it is maybe best not to cause any changes in
behaviour. If the behaviour is unwanted then it should be changed
independently of this patch anyhow.

>> 24b.
>> This code is almost same in function handle_update_internal(), except
>> the wrapping of the params is different. Better to keep everything
>> consistent looking.
>>
>
> Hmm, I have not changed how they look because they have one variable difference (&relmapentry->remoterel vs remoterel), which requires the indentation to be slightly difference. So, I either need a new variable or keep them as-is?

OK. Keep code as-is.

>>
>> 28. src/backend/replication/logical/worker.c - FindReplTupleInLocalRel
>>
>> @@ -2093,12 +2125,11 @@ FindReplTupleInLocalRel(EState *estate,
>> Relation localrel,
>>
>> *localslot = table_slot_create(localrel, &estate->es_tupleTable);
>>
>> I think this might have been existing functionality...
>>
>> The comment says "* Local tuple, if found, is returned in
>> '*localslot'." But the code is unconditionally doing
>> table_slot_create() before it even knows if a tuple was found or not.
>> So what about when it is NOT found - in that case shouldn't there be
>> some cleaning up that (unused?) table slot that got unconditionally
>> created?
>>
>
> This sounds accurate. But I guess it may not have been considered critical as we are operating in the ApplyMessageContext? Tha is going to be freed once a single tuple is dispatched.
>
> I have a slight preference not to do it in this patch, but if you think otherwise let me know.

I agree. Maybe this is not even a leak worth bothering about if it is
only in the short-lived ApplyMessageContext like you say. Anyway,
AFAIK this was already in existing code, so a fix (if any) would
belong in a different patch to this one.

>> 31.
>>
>> + slot_modify_data(remoteslot_part, localslot, entry,
>> newtup);
>>
>> Unnecessary wrapping.
>>
>> ======
>
>
> I think I have not changed this, but fixed anyway

Hmm - I don't see that you changed this, but anyway I guess you
shouldn't be fixing wrapping problems unless this patch caused them.

------
[1] https://www.postgresql.org/message-id/CACawEhXbw%3D%3DK02v3%3DnHFEAFJqegx0b4r2J%2BFtXtKFkJeE6R95Q%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2022-08-30 10:29:26 Re: SQL/JSON features for v15
Previous Message Alvaro Herrera 2022-08-30 09:20:08 Re: SQL/JSON features for v15