Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Önder Kalacı <onderkalaci(at)gmail(dot)com>
Cc: "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Marco Slot <marco(dot)slot(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Date: 2023-03-03 02:46:10
Message-ID: CAA4eK1KP-sV4aER51J-2mELjNzq_zVSLf1+W90Vu0feo-thVNA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 2, 2023 at 6:50 PM Önder Kalacı <onderkalaci(at)gmail(dot)com> wrote:
>
>>
>>
>> In the above profile number of calls to index_fetch_heap(),
>> heapam_index_fetch_tuple() explains the reason for the regression you
>> are seeing with the index scan. Because the update will generate dead
>> tuples in the same transaction and those dead tuples won't be removed,
>> we get those from the index and then need to perform
>> index_fetch_heap() to find out whether the tuple is dead or not. Now,
>> for sequence scan also we need to scan those dead tuples but there we
>> don't need to do back-and-forth between index and heap.
>
>
> Thanks for the insights, I think what you describe makes a lot of sense.
>
...
...
>
> I think we figured out the cause of the performance regression. I think it is not small
> enough for some scenarios like the above. But those scenarios seem like synthetic
> test cases, with not much user impacting implications. Still, I think you are better suited
> to comment on this.
>
> If you consider that this is a significant issue, we could consider the second patch as well
> such that for this unlikely scenario users could disable index scans.
>

I think we can't completely ignore this regression because the key
point of this patch is to pick one of the non-unique indexes to
perform scan and now it will be difficult to predict how many
duplicates (and or dead rows) some index has without more planner
support. Personally, I feel it is better to have a table-level option
for this so that users have some knob to avoid regressions in
particular cases. In general, I agree that it will be a win in more
number of cases than it can regress.

--
With Regards,
Amit Kapila.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2023-03-03 03:22:01 Re: Making empty Bitmapsets always be NULL
Previous Message Peter Smith 2023-03-03 02:43:06 Re: Rework LogicalOutputPluginWriterUpdateProgress