Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

From: Önder Kalacı <onderkalaci(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>
Cc: Marco Slot <marco(dot)slot(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Date: 2023-03-01 13:21:52
Message-ID: CACawEhWbq207=dDvRzteiXoAtGUS=eTcdkBEJ+Lh-bKkohFqYg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Andres, Amit, Shi Yu, all

Andres Freund <andres(at)anarazel(dot)de>, 28 Şub 2023 Sal, 21:39 tarihinde şunu
yazdı:

> Hi,
>
> On 2023-02-25 16:00:05 +0530, Amit Kapila wrote:
> > On Tue, Feb 21, 2023 at 7:55 PM Önder Kalacı <onderkalaci(at)gmail(dot)com>
> wrote:
> > >> I think this overhead seems to be mostly due to the need to perform
> > >> tuples_equal multiple times for duplicate values.
>
> I think more work needs to be done to determine the source of the
> overhead. It's not clear to me why there'd be an increase in tuples_equal()
> calls in the tests upthread.
>
>
You are right, looking closely, in fact, we most of the time do much less
tuples_equal() with index scan.

I've done some profiling with perf, and created flame graphs for the apply
worker, with the
test described above: *-- case 1 (All values are duplicated). *I used the
following commands:
- perf record -F 99 -p 122555 -g -- sleep 60
- perf script | ./stackcollapse-perf.pl > out.perf-folded
- ./flamegraph.pl out.perf-folded > perf_[index|seq]_scan.svg

I attached both flame graphs. I do not see anything specific regarding what
the patch does, but
instead the difference mostly seems to come down to index scan vs
sequential scan related
functions. As I continue to investigate, I thought it might be useful to
share the flame graphs
so that more experienced hackers could comment on the difference.

Regarding my own end-to-end tests: In some runs, the sequential scan is
indeed faster for case-1. But,
when I execute *update tbl set a=a+1; *for 50 consecutive times, and
measure end to end performance, I see
much better results for index scan, only case-1 is on-par as mostly I'd
expect.

Case-1, running the update 50 times and waiting all changes applied

- index scan: 2minutes 36 seconds
- sequential scan: 2minutes 30 seconds

Case-2, running the update 50 times and waiting all changes applied

- index scan: 1 minutes, 2 seconds
- sequential scan: 2minutes 30 seconds

Case-7, running the update 50 times and waiting all changes applied

- index scan: 6 seconds
- sequential scan: 2minutes 26seconds

> # Result
The time executing update (the average of 3 runs is taken, the unit is
milliseconds):

Shi Yu, could it be possible for you to re-run the tests with some more
runs, and share the average?
I suspect maybe your test results have a very small pool size, and some
runs are making
the average slightly problematic.

In my tests, I shared the total time, which is probably also fine.

Thanks,
Onder

Attachment Content-Type Size
perf_index_scan.svg image/svg+xml 90.5 KB
perf_seq_scan.svg image/svg+xml 59.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message gkokolatos 2023-03-01 13:39:14 Re: Add LZ4 compression in pg_dump
Previous Message Mikhail Gribkov 2023-03-01 13:12:14 Re: On login trigger: take three