Re: old_snapshot_threshold bottleneck on replica

From: Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>
To: Maxim Orlov <orlovmg(at)gmail(dot)com>
Cc: Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: old_snapshot_threshold bottleneck on replica
Date: 2023-01-24 10:35:21
Message-ID: CALT9ZEEwfZkBC8LHddkigJC_MUQtKsxUNN-veVDVWmmBvKn_oQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi, Maxim!

On Mon, 23 Jan 2023 at 18:40, Maxim Orlov <orlovmg(at)gmail(dot)com> wrote:
>
> Hi!
>
> One of our customers stumble onto a significant performance degradation while running multiple OLAP-like queries on a replica.
> After some investigation, it became clear that the problem is in accessing old_snapshot_threshold parameter.
>
> Accessing old_snapshot_threshold parameter is guarded by mutex_threshold. This is not a problem on primary
> server, since we rarely call GetOldSnapshotThresholdTimestamp:
>
> 5028 void
> 5029 TestForOldSnapshot_impl(Snapshot snapshot, Relation relation)
> 5030 {
> 5031 ····if (RelationAllowsEarlyPruning(relation)
> 5032 ········&& (snapshot)->whenTaken < GetOldSnapshotThresholdTimestamp())
> 5033 ········ereport(ERROR,
> 5034 ················(errcode(ERRCODE_SNAPSHOT_TOO_OLD),
> 5035 ················ errmsg("snapshot too old")));
>
> But in case of a replica, we have to call GetOldSnapshotThresholdTimestamp much often. So, this become a
> bottleneck. The customer solve this issue by setting old_snapshot_threshold to 0. But, I think, we can
> do something about it.
>
> Some more investigation:
>
> -- On primary --
> $ ./bin/psql postgres -c "create database benchmark"
> CREATE DATABASE
> $ ./bin/pgbench -i -Uorlov -s300 benchmark
> dropping old tables...
> NOTICE: table "pgbench_accounts" does not exist, skipping
> ...
> creating tables...
> generating data (client-side)...
> 30000000 of 30000000 tuples (100%) done (elapsed 142.37 s, remaining 0.00 s)
> vacuuming...
> creating primary keys...
> done in 177.67 s (drop tables 0.00 s, create tables 0.01 s, client-side generate 144.45 s, vacuum 0.59 s, primary keys 32.61 s).
>
> -- On secondary --
> $ touch 1.sql
> $ vim 1.sql
> $ cat 1.sql
> \set bid random(1, 300)
> BEGIN;
> SELECT sum(aid) FROM pgbench_accounts where bid = :bid GROUP BY bid;
> END;
> $ ./bin/pgbench -f 1.sql -p5433 -Uorlov -j10 -c100 -T720 -P1 -n benchmark
> pgbench (16devel)
> progress: 1.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
> ...
> progress: 20.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
>
> $ perf record -F 99 -a -g --call-graph=dwarf sleep 5
> $ perf script --header --fields comm,pid,tid,time,event,ip,sym,dso > file
> $ grep s_lock file | wc -l
>
> 3486
>
>
> My proposal is to use atomic for threshold_timestamp and threshold_xid. PFA 0001 patch.
> With patch 0001 we got:
>
> $ grep s_lock file2 | wc -l
> 8
>
>
> Maybe, we shall go farther and remove mutex_threshold here? This will lead to inconsistency of
> threshold_timestamp and threshold_xid, but is this really a problem?
>
> Thoughts?

I think optimizing locking and switching to atomics wherever it
improves performance is a good direction. If performance improvement
could be demonstrated in a more direct way it would be a good argument
to commit the improvement. Personally I like TPS plots like in [1].

[1] https://www.postgresql.org/message-id/CALT9ZEHSX1Hpz5xjDA62yHAHtpinkA6hg8Zt-odyxqppmKbQFA%40mail.gmail.com

Kind regards,
Pavel Borisov,
Supabase

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2023-01-24 11:20:51 Re: Schema variables - new implementation for Postgres 15 (typo)
Previous Message Ranier Vilela 2023-01-24 10:34:19 Re: Non-decimal integer literals