Re: logical decoding and replication of sequences, take 2

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Subject: Re: logical decoding and replication of sequences, take 2
Date: 2023-11-29 23:58:45
Message-ID: e66ff0de-6f2d-baf9-aee9-ba412868df75@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11/29/23 15:41, Tomas Vondra wrote:
> ...
>>
>> One thing that worries me about that approach is that it can suck with
>> the workload that has a lot of DDLs that create XLOG_SMGR_CREATE
>> records. We have previously fixed some such workloads in logical
>> decoding where decoding a transaction containing truncation of a table
>> with a lot of partitions (1000 or more) used to take a very long time.
>> Don't we face performance issues in such scenarios?
>>
>
> I don't think we do, really. We will have to decode the SMGR records and
> add the relfilenodes to the hash table(s), but I think that affects the
> lookup performance too much. What I think might be a problem is if we
> have many top-level transactions, especially if those transactions do
> something that creates a relfilenode. Because then we'll have to do a
> hash_search for each of them, and that might be measurable even if each
> lookup is O(1). And we do the lookup for every sequence change ...
>

I did some micro-benchmarking today, trying to identify cases where this
would cause unexpected problems, either due to having to maintain all
the relfilenodes, or due to having to do hash lookups for every sequence
change. But I think it's fine, mostly ...

I did all the following tests with 64 clients. I may try more, but even
with this there should be fair number of concurrent transactions, which
determines the number of top-level transactions in reorderbuffer. I'll
try with more clients tomorrow, but I don't think it'll change stuff.

The test is fairly simple - run a particular number of transactions
(might be 1000 * 64, or more). And then measure how long it takes to
decode the changes using test_decoding.

Now, the various workloads I tried:

1) "good case" - small OLTP transactions, a couple nextval('s') calls

begin;
insert into t (1);
select nextval('s');
insert into t (1);
commit;

This is pretty fine, the sequence part of reorderbuffer is really not
measurable, it's like 1% of the total CPU time. Which is expected,
because we only wal-log every 32-nd increment or so.

2) "good case" - same as (1) but more nextval calls to always do wal

begin;
insert into t (1);
select nextval('s') from generate_series(1,40);
insert into t (1);
commit;

Here sequences are more measurable, it's like 15% of CPU time, but most
of that comes to AbortCurrentTransaction() in the non-transactional
branch of ReorderBufferQueueSequence. I don't think there's a way around
that, and it's entirely unrelated to relfilenodes. The function checking
if the change is transactional (ReorderBufferSequenceIsTransactional) is
less than 1% of the profile - and this is the version that always walks
all top-level transactions.

3) "bad case" - small transactions that generate a lot of relfilenodes

select alter_sequence();

where the function is defined like this (I did create 1000 sequences
before the test):

CREATE OR REPLACE FUNCTION alter_sequence() RETURNS void AS $$
DECLARE
v INT;
BEGIN
v := 1 + (random() * 999)::int;
execute format('alter sequence s%s restart with 1000', v);
perform nextval('s');
END;
$$ LANGUAGE plpgsql;

This performs terribly, but it's entirely unrelated to sequences.
Current master has exactly the same problem, if transactions do DDL.
Like this, for example:

CREATE OR REPLACE FUNCTION create_table() RETURNS void AS $$
DECLARE
v INT;
BEGIN
v := 1 + (random() * 999)::int;
execute format('create table t%s (a int)', v);
execute format('drop table t%s', v);
insert into t values (1);
END;
$$ LANGUAGE plpgsql;

This has the same impact on master. The perf report shows this:

--98.06%--pg_logical_slot_get_changes_guts
|
--97.88%--LogicalDecodingProcessRecord
|
--97.56%--xact_decode
|
--97.51%--DecodeCommit
|
|--91.92%--SnapBuildCommitTxn
| |
| --91.65%--SnapBuildBuildSnapshot
| |
| --91.14%--pg_qsort

The sequence decoding is maybe ~1%. The reason why SnapBuildSnapshot
takes so long is because:

-----------------
Breakpoint 1, SnapBuildBuildSnapshot (builder=0x21f60f8)
at snapbuild.c:498
498 + sizeof(TransactionId) * builder->committed.xcnt
(gdb) p builder->committed.xcnt
$4 = 11532
-----------------

And with each iteration it grows by 1. That looks quite weird, possibly
a bug worth fixing, but unrelated to this patch. I can't investigate
this more at the moment, not sure when/if I'll get to that.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2023-11-30 00:01:46 Re: Streaming I/O, vectored I/O (WIP)
Previous Message Tom Lane 2023-11-29 23:56:04 Re: encoding affects ICU regex character classification