Re: logical decoding and replication of sequences, take 2

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Subject: Re: logical decoding and replication of sequences, take 2
Date: 2023-12-02 00:23:08
Message-ID: c6a08e9d-7b08-76d4-197d-fab54039ed91@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/1/23 12:08, Hayato Kuroda (Fujitsu) wrote:
> Dear Tomas,
>
>> I did some micro-benchmarking today, trying to identify cases where this
>> would cause unexpected problems, either due to having to maintain all
>> the relfilenodes, or due to having to do hash lookups for every sequence
>> change. But I think it's fine, mostly ...
>>
>
> I did also performance tests (especially case 3). First of all, there are some
> variants from yours.
>
> 1. patch 0002 was reverted because it has an issue. So this test checks whether
> refactoring around ReorderBufferSequenceIsTransactional seems really needed.

FWIW I also did the benchmarks without the 0002 patch, for the same
reason. I forgot to mention that.

> 2. per comments from Amit, I also measured the abort case. In this case, the
> alter_sequence() is called but the transaction is aborted.
> 3. I measured with changing number of clients {8, 16, 32, 64, 128}. In any cases,
> clients executed 1000 transactions. The performance machine has 128 core so that
> result for 128 clients might be saturated.
> 4. a short sleep (0.1s) was added in alter_sequence(), especially between
> "alter sequence" and nextval(). Because while testing, I found that the
> transaction is too short to execute in parallel. I think it is reasonable
> because ReorderBufferSequenceIsTransactional() might be worse when the parallelism
> is increased.
>
> I attached one backend process via perf and executed pg_slot_logical_get_changes().
> Attached txt file shows which function occupied CPU time, especially from
> pg_logical_slot_get_changes_guts() and ReorderBufferSequenceIsTransactional().
> Here are my observations about them.
>
> * In case of commit, as you said, SnapBuildCommitTxn() seems dominant for 8-64
> clients case.
> * For (commit, 128 clients) case, however, ReorderBufferRestoreChanges() waste
> many times. I think this is because changes exceed logical_decoding_work_mem,
> so we do not have to analyze anymore.
> * In case of abort, CPU time used by ReorderBufferSequenceIsTransactional() is linearly
> longer. This means that we need to think some solution to avoid the overhead by
> ReorderBufferSequenceIsTransactional().
>
> ```
> 8 clients 3.73% occupied time
> 16 7.26%
> 32 15.82%
> 64 29.14%
> 128 46.27%
> ```

Interesting, so what exactly does the transaction do? Anyway, I don't
think this is very surprising - I believe it behaves like this because
of having to search in many hash tables (one in each toplevel xact). And
I think the solution I explained before (maintaining a single toplevel
hash, instead of many per-top-level hashes).

FWIW I find this case interesting, but not very practical, because no
practical workload has that many aborts.

>
> * In case of abort, I also checked CPU time used by ReorderBufferAddRelFileLocator(), but
> it seems not so depends on the number of clients.
>
> ```
> 8 clients 3.66% occupied time
> 16 6.94%
> 32 4.65%
> 64 5.39%
> 128 3.06%
> ```
>
> As next step, I've planned to run the case which uses setval() function, because it
> generates more WALs than normal nextval();
> How do you think?
>

Sure, although I don't think it's much different from the test selecting
40 values from the sequence (in each transaction). That generates about
the same amount of WAL.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2023-12-02 01:04:15 Re: [PATCH] plpython function causes server panic
Previous Message Tomas Vondra 2023-12-02 00:10:57 Re: logical decoding and replication of sequences, take 2