Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

From: Önder Kalacı <onderkalaci(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Date: 2022-08-20 11:02:03
Message-ID: CACawEhXavM7kjzVgMfRSN+YyB_Lxqjoi8PJ2Gh7MdL-eScb9vw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I'm a little late to catch up with your comments, but here are my replies:

> My answer for the above assumes that your question is regarding what
> happens if you ANALYZE on a partitioned table. If your question is
> something different, please let me know.
> >
>
> I was talking about inheritance cases, something like:
> create table tbl1 (a int);
> create table tbl1_part1 (b int) inherits (tbl1);
> create table tbl1_part2 (c int) inherits (tbl1);
>
> What we do in such cases is documented as: "if the table being
> analyzed has inheritance children, ANALYZE gathers two sets of
> statistics: one on the rows of the parent table only, and a second
> including rows of both the parent table and all of its children. This
> second set of statistics is needed when planning queries that process
> the inheritance tree as a whole. The child tables themselves are not
> individually analyzed in this case."

Oh, I haven't considered inherited tables. That seems right, the
statistics of the children are not updated when the parent is analyzed.

> Now, the point I was worried about was what if the changes in child
> tables (*_part1, *_part2) are much more than in tbl1? In such cases,
> we may not invalidate child rel entries, so how will logical
> replication behave for updates/deletes on child tables? There may not
> be any problem here but it is better to do some analysis of such cases
> to see how it behaves.
>

I also haven't observed any specific issues. In the end, when the user (or
autovacuum) does ANALYZE on the child, it is when the statistics are
updated for the child. Although I do not have much experience with
inherited tables, this sounds like the expected behavior?

I also pushed a test covering inherited tables. First, a basic test on the
parent. Then, show that updates on the parent can also use indexes of the
children. Also, after an ANALYZE on the child, we can re-calculate the
index and use the index with a higher cardinality column.

> > Also, for the majority of the use-cases, I think we'd probably expect an
> index on a column with high cardinality -- hence use index scan. So, bitmap
> index scans are probably not going to be that much common.
> >
>
> You are probably right here but I don't think we can make such
> assumptions. I think the safest way to avoid any regression here is to
> choose an index when the planner selects an index scan. We can always
> extend it later to bitmap scans if required. We can add a comment
> indicating the same.
>
>
Alright, I got rid of the bitmap scans.

Though, it caused few of the new tests to fail. I think because of the data
size/distribution, the planner picks bitmap scans. To make the tests
consistent and small, I added `enable_bitmapscan to off` for this new test
file. Does that sound ok to you? Or, should we change the tests to make
sure they genuinely use index scans?

*
> + /*
> + * For insert-only workloads, calculating the index is not necessary.
> + * As the calculation is not expensive, we are fine to do here (instead
> + * of during first update/delete processing).
> + */
>
> I think here instead of talking about cost, we should mention that it
> is quite an infrequent operation i.e performed only when we first time
> performs an operation on the relation or after invalidation. This is
> because I think the cost is relative.
>

Changed, does that look better?

+
> + /*
> + * Although currently it is not possible for planner to pick a
> + * partial index or indexes only on expressions,
>
> It may be better to expand this comment by describing a bit why it is
> not possible in our case. You might want to give the function
> reference where it is decided.
>
> Make sense, added some more information.

Thanks,
Onder

Attachment Content-Type Size
v7_0001_use_index_on_subs_when_pub_rep_ident_full.patch application/octet-stream 65.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Erik Rijkers 2022-08-20 13:32:20 Re: Schema variables - new implementation for Postgres 15
Previous Message Daniel Gustafsson 2022-08-20 11:02:01 Re: sslinfo extension - add notbefore and notafter timestamps