Re: logical decoding and replication of sequences, take 2

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Subject: Re: logical decoding and replication of sequences, take 2
Date: 2023-12-05 12:17:57
Message-ID: CAA4eK1+2UEcVHDqT2a7SKLyvoLG7Nz=OaJFiDOg3tbr6T5G4qA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Dec 3, 2023 at 11:22 PM Tomas Vondra
<tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>
> Thanks for the script. Are you also measuring the time it takes to
> decode this using test_decoding?
>
> FWIW I did more comprehensive suite of tests over the weekend, with a
> couple more variations. I'm attaching the updated scripts, running it
> should be as simple as
>
> ./run.sh BRANCH TRANSACTIONS RUNS
>
> so perhaps
>
> ./run.sh master 1000 3
>
> to do 3 runs with 1000 transactions per client. And it'll run a bunch of
> combinations hard-coded in the script, and write the timings into a CSV
> file (with "master" in each row).
>
> I did this on two machines (i5 with 4 cores, xeon with 16/32 cores). I
> did this with current master, the basic patch (without the 0002 part),
> and then with the optimized approach (single global hash table, see the
> 0004 part). That's what master / patched / optimized in the results is.
>
> Interestingly enough, the i5 handled this much faster, it seems to be
> better in single-core tasks. The xeon is still running, so the results
> for "optimized" only have one run (out of 3), but shouldn't change much.
>
> Attached is also a table summarizing this, and visualizing the timing
> change (vs. master) in the last couple columns. Green is "faster" than
> master (but we don't really expect that), and "red" means slower than
> master (the more red, the slower).
>
> There results are grouped by script (see the attached .tgz), with either
> 32 or 96 clients (which does affect the timing, but not between master
> and patch). Some executions have no pg_sleep() calls, some have 0.001
> wait (but that doesn't seem to make much difference).
>
> Overall, I'd group the results into about three groups:
>
> 1) good cases [nextval, nextval-40, nextval-abort]
>
> These are cases that slow down a bit, but the slowdown is mostly within
> reasonable bounds (we're making the decoding to do more stuff, so it'd
> be a bit silly to require that extra work to make no impact). And I do
> think this is reasonable, because this is pretty much an extreme / worst
> case behavior. People don't really do just nextval() calls, without
> doing anything else. Not to mention doing aborts for 100% transactions.
>
> So in practice this is going to be within noise (and in those cases the
> results even show speedup, which seems a bit surprising). It's somewhat
> dependent on CPU too - on xeon there's hardly any regression.
>
>
> 2) nextval-40-abort
>
> Here the slowdown is clear, but I'd argue it generally falls in the same
> group as (1). Yes, I'd be happier if it didn't behave like this, but if
> someone can show me a practical workload affected by this ...
>
>
> 3) irrelevant cases [all the alters taking insane amounts of time]
>
> I absolutely refuse to care about these extreme cases where decoding
> 100k transactions takes 5-10 minutes (on i5), or up to 30 minutes (on
> xeon). If this was a problem for some practical workload, we'd have
> already heard about it I guess. And even if there was such workload, it
> wouldn't be up to this patch to fix that. There's clearly something
> misbehaving in the snapshot builder.
>
>
> I was hopeful the global hash table would be an improvement, but that
> doesn't seem to be the case. I haven't done much profiling yet, but I'd
> guess most of the overhead is due to ReorderBufferQueueSequence()
> starting and aborting a transaction in the non-transactinal case. Which
> is unfortunate, but I don't know if there's a way to optimize that.
>

Before discussing the alternative ideas you shared, let me try to
clarify my understanding so that we are on the same page. I see two
observations based on the testing and discussion we had (a) for
non-transactional cases, the overhead observed is mainly due to
starting/aborting a transaction for each change; (b) for transactional
cases, we see overhead due to traversing all the top-level txns and
check the hash table for each one to find whether change is
transactional.

Am, I missing something?

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andy Fan 2023-12-05 12:24:04 Re: Avoid detoast overhead when possible
Previous Message Tomas Vondra 2023-12-05 11:48:31 Re: undetected deadlock in ALTER SUBSCRIPTION ... REFRESH PUBLICATION