Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

From: Önder Kalacı <onderkalaci(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>, Marco Slot <marco(dot)slot(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Date: 2023-03-03 09:32:39
Message-ID: CACawEhWkzEwzLo3Pmb72Xz+y=EfGhV-JsA97uxTx0z-QihR9rQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Amit, all

> Few comments on 0001
> ====================
> 1.
> + such suitable indexes, the search on the subscriber s ide can be
> very inefficient,
>
> unnecessary space in 'side'
>

Fixed in v28

>
> 2.
> - identity. If the table does not have any suitable key, then it can be
> set
> - to replica identity <quote>full</quote>, which means the entire row
> becomes
> - the key. This, however, is very inefficient and should only be used
> as a
> - fallback if no other solution is possible. If a replica identity other
> + identity. When replica identity <literal>FULL</literal> is specified,
> + indexes can be used on the subscriber side for searching the rows.
>
> I think it is better to retain the first sentence (If the table does
> not ... entire row becomes the key.) as that says what will be part of
> the key.
>
>
right, that sentence looks useful, added back.

> 3.
> - comprising the same or fewer columns must also be set on the subscriber
> - side. See <xref linkend="sql-altertable-replica-identity"/> for
> details on
> - how to set the replica identity. If a table without a replica
> identity is
> - added to a publication that replicates <command>UPDATE</command>
> + comprising the same or fewer columns must also be set on the
> subscriber side.
> + See <xref linkend="sql-altertable-replica-identity"/> for
> + details on how to set the replica identity. If a table without a
> replica
> + identity is added to a publication that replicates
> <command>UPDATE</command>
>
> I don't see any change in this except line length. If so, I don't
> think we should change it as part of this patch.
>
>
Yes, fixed. But the first line (starting with See <xref ..) still shows as
if it is changed,
probably because its line has changed. I couldn't make that line show as
it had not
changed.

> 4.
> /*
> * Setup a ScanKey for a search in the relation 'rel' for a tuple 'key'
> that
> * is setup to match 'rel' (*NOT* idxrel!).
> *
> - * Returns whether any column contains NULLs.
> + * Returns how many columns should be used for the index scan.
> + *
> + * This is not generic routine, it expects the idxrel to be
> + * a btree, non-partial and have at least one column
> + * reference (e.g., should not consist of only expressions).
> *
> - * This is not generic routine, it expects the idxrel to be replication
> - * identity of a rel and meet all limitations associated with that.
> + * By definition, replication identity of a rel meets all
> + * limitations associated with that. Note that any other
> + * index could also meet these limitations.
>
> The comment changes look quite asymmetric to me. Normally, we break
> the line if the line length goes beyond 80 cols. Please check and
> change other places in the patch if they have a similar symptom.
>

Went over the patch, and expanded each line ~80 chars.

I'm guessing 80 is not the hard limit, in some cases I went over 81-82.

>
> 5.
> + * There are no fundamental problems for supporting non-btree
> + * and/or partial indexes.
>
> Can we mention partial indexes in the above comment? It seems to me
> that because the required tuple may not satisfy the expression (in the
> case of partial indexes) it may not be easy to support it.
>

Expanded the comment and explained the differences a little further.

>
> 6.
> build_replindex_scan_key()
> {
> ...
> + for (index_attoff = 0; index_attoff <
> IndexRelationGetNumberOfKeyAttributes(idxrel);
> + index_attoff++)
> ...
> ...
> +#ifdef USE_ASSERT_CHECKING
> + IndexInfo *indexInfo = BuildIndexInfo(idxrel);
> +
> + Assert(!IsIndexOnlyOnExpression(indexInfo));
> +#endif
> ...
> }
>
> We can avoid building index info multiple times. This can be either
> checked at the beginning of the function outside attribute offset loop
> or we can probably cache it. I understand this is for assert builds
> but seems easy to avoid it doing multiple times and it also looks odd
> to do it multiple times for the same index.
>

Applied your suggestions. Although I do not have strong opinions, I think
that
it was easier to follow with building the indexInfo for each iteration.

>
> 7.
> - /* Build scankey for every attribute in the index. */
> - for (attoff = 0; attoff <
> IndexRelationGetNumberOfKeyAttributes(idxrel); attoff++)
> + /* Build scankey for every non-expression attribute in the index. */
> + for (index_attoff = 0; index_attoff <
> IndexRelationGetNumberOfKeyAttributes(idxrel);
> + index_attoff++)
> {
> Oid operator;
> Oid opfamily;
> + Oid optype = get_opclass_input_type(opclass->values[index_attoff]);
> RegProcedure regop;
> - int pkattno = attoff + 1;
> - int mainattno = indkey->values[attoff];
> - Oid optype = get_opclass_input_type(opclass->values[attoff]);
> + int table_attno = indkey->values[index_attoff];
>
> I don't think here we need to change variable names if we retain
> mainattno as it is instead of changing it to table_attno. The current
> naming doesn't seem bad for the current usage of the patch.
>

Hmm, I'm actually not convinced that the variable naming on HEAD is good for
the current patch. The main difference is that now we allow indexes like:
* CREATE INDEX idx ON table(foo(col), col_2)*

(See # Testcase start: SUBSCRIPTION CAN USE INDEXES WITH
EXPRESSIONS AND COLUMNS)

In such indexes, we could skip the attributes of the index. So, skey_attoff
is not
equal to index_attoff anymore. So, calling out this explicitly via the
variable names
seems more robust to me. Plus, mainattno sounded vague to me when I first
read
this function.

So, unless you have strong objections, I'm leaning towards having variable
names more explicit. I'm also open for suggestions if you think the names
I picked is not clear enough.

>
> 8.
> + TypeCacheEntry **eq = NULL; /* only used when the index is not RI or PK
> */
>
> Normally, we don't add such comments as the usage is quite obvious by
> looking at the code.
>
>
Sure, I also don't see much value for it, removed.

Attached v29 for this review. Note that I'll be working on the disable
index scan changes after
I reply to some of the other pending reviews.

Thanks,
Onder

Attachment Content-Type Size
v29_0001_use_index_on_subs_when_pub_rep_ident_full.patch application/x-patch 67.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nazir Bilal Yavuz 2023-03-03 10:01:00 Re: meson: Non-feature feature options
Previous Message Peter Eisentraut 2023-03-03 09:16:09 Re: meson: Non-feature feature options