From: | Önder Kalacı <onderkalaci(at)gmail(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Marco Slot <marco(dot)slot(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher |
Date: | 2023-03-02 13:20:09 |
Message-ID: | CACawEhUDwAqFx4T6XhxBuKJKqWTFfy_iUhf22JVcnK9m6mHCVA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Amit, Shi Yu
> >
> > b. Executed SQL.
> > I executed TRUNCATE and INSERT before each UPDATE. I am not sure if you
> did the
> > same, or just executed 50 consecutive UPDATEs. If the latter one, there
> would be
> > lots of old tuples and this might have a bigger impact on sequential
> scan. I
> > tried this case (which executes 50 consecutive UPDATEs) and also saw
> that the
> > overhead is smaller than before.
>
Alright, I'll do similarly, execute truncate/insert before each update.
> In the above profile number of calls to index_fetch_heap(),
> heapam_index_fetch_tuple() explains the reason for the regression you
> are seeing with the index scan. Because the update will generate dead
> tuples in the same transaction and those dead tuples won't be removed,
> we get those from the index and then need to perform
> index_fetch_heap() to find out whether the tuple is dead or not. Now,
> for sequence scan also we need to scan those dead tuples but there we
> don't need to do back-and-forth between index and heap.
Thanks for the insights, I think what you describe makes a lot of sense.
> I think we can
> once check with more number of tuples (say with 20000, 50000, etc.)
> for case-1.
>
>
As we'd expect, this test made the performance regression more visible.
I quickly ran case-1 for 50 times with 50000 as Shi Yu does, and got
the following results. I'm measuring end-to-end times for running the
whole set of commands:
seq_scan: 00 hr 24 minutes, 42 seconds
index_scan: 01 hr 04 minutes 54 seconds
But, I'm still not sure whether we should focus on this regression too
much. In the end, what we are talking about is a case (e.g., all or many
rows are duplicated) where using an index is not a good idea anyway. So,
I doubt users would have such indexes.
> The quadratic apply performance
> the sequential scans cause, are a much bigger hazard for users than some
apply
> performance reqression.
Quoting Andres' note, I personally think that the regression for this case
is not a big concern.
> I'd prefer not having an option, because we figure out the cause of the
> performance regression (reducing it to be small enough to not care). After
> that an option defaulting to using indexes. I don't think an option
defaulting
> to false makes sense.
I think we figured out the cause of the performance regression. I think it
is not small
enough for some scenarios like the above. But those scenarios seem like
synthetic
test cases, with not much user impacting implications. Still, I think you
are better suited
to comment on this.
If you consider that this is a significant issue, we could consider the
second patch as well
such that for this unlikely scenario users could disable index scans.
Thanks,
Onder
From | Date | Subject | |
---|---|---|---|
Next Message | Hayato Kuroda (Fujitsu) | 2023-03-02 13:25:17 | RE: Time delayed LR (WAS Re: logical replication restrictions) |
Previous Message | Drouvot, Bertrand | 2023-03-02 13:04:58 | Re: Fix comments in gistxlogDelete, xl_heap_freeze_page and xl_btree_delete |