RE: logical decoding and replication of sequences, take 2

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'Tomas Vondra' <tomas(dot)vondra(at)enterprisedb(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Subject: RE: logical decoding and replication of sequences, take 2
Date: 2023-12-01 11:08:16
Message-ID: TY3PR01MB9889D457278B254CA87D1325F581A@TY3PR01MB9889.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Tomas,

> I did some micro-benchmarking today, trying to identify cases where this
> would cause unexpected problems, either due to having to maintain all
> the relfilenodes, or due to having to do hash lookups for every sequence
> change. But I think it's fine, mostly ...
>

I did also performance tests (especially case 3). First of all, there are some
variants from yours.

1. patch 0002 was reverted because it has an issue. So this test checks whether
refactoring around ReorderBufferSequenceIsTransactional seems really needed.
2. per comments from Amit, I also measured the abort case. In this case, the
alter_sequence() is called but the transaction is aborted.
3. I measured with changing number of clients {8, 16, 32, 64, 128}. In any cases,
clients executed 1000 transactions. The performance machine has 128 core so that
result for 128 clients might be saturated.
4. a short sleep (0.1s) was added in alter_sequence(), especially between
"alter sequence" and nextval(). Because while testing, I found that the
transaction is too short to execute in parallel. I think it is reasonable
because ReorderBufferSequenceIsTransactional() might be worse when the parallelism
is increased.

I attached one backend process via perf and executed pg_slot_logical_get_changes().
Attached txt file shows which function occupied CPU time, especially from
pg_logical_slot_get_changes_guts() and ReorderBufferSequenceIsTransactional().
Here are my observations about them.

* In case of commit, as you said, SnapBuildCommitTxn() seems dominant for 8-64
clients case.
* For (commit, 128 clients) case, however, ReorderBufferRestoreChanges() waste
many times. I think this is because changes exceed logical_decoding_work_mem,
so we do not have to analyze anymore.
* In case of abort, CPU time used by ReorderBufferSequenceIsTransactional() is linearly
longer. This means that we need to think some solution to avoid the overhead by
ReorderBufferSequenceIsTransactional().

```
8 clients 3.73% occupied time
16 7.26%
32 15.82%
64 29.14%
128 46.27%
```

* In case of abort, I also checked CPU time used by ReorderBufferAddRelFileLocator(), but
it seems not so depends on the number of clients.

```
8 clients 3.66% occupied time
16 6.94%
32 4.65%
64 5.39%
128 3.06%
```

As next step, I've planned to run the case which uses setval() function, because it
generates more WALs than normal nextval();
How do you think?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachment Content-Type Size
perf_results.txt text/plain 9.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Richard Guo 2023-12-01 11:12:34 A wrong comment about search_indexed_tlist_for_var
Previous Message Amit Kapila 2023-12-01 11:06:05 Re: Synchronizing slots from primary to standby