Re: suboverflowed subtransactions concurrency performance optimize

From: Zhihong Yu <zyu(at)yugabyte(dot)com>
To: Pengchengliu <pengchengliu(at)tju(dot)edu(dot)cn>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: suboverflowed subtransactions concurrency performance optimize
Date: 2021-08-31 17:20:56
Message-ID: CALNJ-vQE93GQeKmTiGaFKxk2kGNW4WvJvO4kJwUSuKemw_156A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Aug 30, 2021 at 1:43 AM Pengchengliu <pengchengliu(at)tju(dot)edu(dot)cn>
wrote:

> Hi hackers,
> I wrote a patch to resolve the subtransactions concurrency performance
> problems when suboverflowed.
>
> When we use more than PGPROC_MAX_CACHED_SUBXIDS(64) subtransactions per
> transaction concurrency, it will lead to subtransactions performance
> problems.
> All backend will be stuck at acquiring lock SubtransSLRULock.
>
> The reproduce steps in PG master branch:
>
> 1, init a cluster, append postgresql.conf as below:
>
> max_connections = '2500'
> max_files_per_process = '2000'
> max_locks_per_transaction = '64'
> max_parallel_maintenance_workers = '8'
> max_parallel_workers = '60'
> max_parallel_workers_per_gather = '6'
> max_prepared_transactions = '15000'
> max_replication_slots = '10'
> max_wal_senders = '64'
> max_worker_processes = '250'
> shared_buffers = 8GB
>
> 2, create table and insert some records as below:
>
> CREATE UNLOGGED TABLE contend (
> id integer,
> val integer NOT NULL
> )
> WITH (fillfactor='50');
>
> INSERT INTO contend (id, val)
> SELECT i, 0
> FROM generate_series(1, 10000) AS i;
>
> VACUUM (ANALYZE) contend;
>
> 3, The script subtrans_128.sql in attachment. use pgbench with
> subtrans_128.sql as below.
> pgbench -d postgres -p 33800 -n -r -f subtrans_128.sql -c 500 -j 500 -T
> 3600
>
>
> 4, After for a while, we can get the stuck result. We can query
> pg_stat_activity. All backends wait event is SubtransSLRULock.
> We can use pert top and try find the root cause. The result of perf top
> as below:
> 66.20% postgres [.] pg_atomic_compare_exchange_u32_impl
> 29.30% postgres [.] pg_atomic_fetch_sub_u32_impl
> 1.46% postgres [.] pg_atomic_read_u32
> 1.34% postgres [.] TransactionIdIsCurrentTransactionId
> 0.75% postgres [.] SimpleLruReadPage_ReadOnly
> 0.14% postgres [.] LWLockAttemptLock
> 0.14% postgres [.] LWLockAcquire
> 0.12% postgres [.] pg_atomic_compare_exchange_u32
> 0.09% postgres [.] HeapTupleSatisfiesMVCC
> 0.06% postgres [.] heapgetpage
> 0.03% postgres [.] sentinel_ok
> 0.03% postgres [.] XidInMVCCSnapshot
> 0.03% postgres [.] slot_deform_heap_tuple
> 0.03% postgres [.] ExecInterpExpr
> 0.02% postgres [.] AllocSetCheck
> 0.02% postgres [.] HeapTupleSatisfiesVisibility
> 0.02% postgres [.] LWLockRelease
> 0.02% postgres [.] TransactionIdPrecedes
> 0.02% postgres [.] SubTransGetParent
> 0.01% postgres [.] heapgettup_pagemode
> 0.01% postgres [.] CheckForSerializableConflictOutNeeded
>
>
> After view the subtrans codes, it is easy to find that the global LWLock
> SubtransSLRULock is the bottleneck of subtrans concurrently.
>
> When a bakcend session assign more than PGPROC_MAX_CACHED_SUBXIDS(64)
> subtransactions, we will get a snapshot with suboverflowed.
> A suboverflowed snapshot does not contain all data required to determine
> visibility, so PostgreSQL will occasionally have to resort to pg_subtrans.
> These pages are cached in shared buffers, but you can see the overhead of
> looking them up in the high rank of SimpleLruReadPage_ReadOnly in the perf
> output.
>
> To resolve this performance problem, we think about a solution which cache
> SubtransSLRU to local cache.
> First we can query parent transaction id from SubtransSLRU, and copy the
> SLRU page to local cache page.
> After that if we need query parent transaction id again, we can query it
> from local cache directly.
> It will reduce the number of acquire and release LWLock SubtransSLRULock
> observably.
>
> From all snapshots, we can get the latest xmin. All transaction id which
> precedes this xmin, it muse has been commited/abortd.
> Their parent/top transaction has been written subtrans SLRU. Then we can
> cache the subtrans SLRU and copy it to local cache.
>
> Use the same produce steps above, with our patch we cannot get the stuck
> result.
> Note that append our GUC parameter in postgresql.conf. This optimize is off
> in default.
> local_cache_subtrans_pages=128
>
> The patch is base on PG master branch
> 0d906b2c0b1f0d625ff63d9ace906556b1c66a68
>
>
> Our project in https://github.com/ADBSQL/AntDB, Welcome to follow us,
> AntDB, AsiaInfo's PG-based distributed database product
>
> Thanks
> Pengcheng
>
> Hi,

+ uint16 valid_offset; /* how many entry is valid */

how many entry is -> how many entries are

+int slru_subtrans_page_num = 32;

Looks like the variable represents the number of subtrans pages. Maybe name
the variable slru_subtrans_page_count ?

+ if (lbuffer->in_htab == false)

The condition can be written as 'if (!lbuffer->in_htab)'

For SubtransAllocLocalBuffer(), you can enclose the body of method in while
loop so that you don't use goto statement.

Cheers

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bossart, Nathan 2021-08-31 17:24:49 Re: Pre-allocating WAL files
Previous Message Andres Freund 2021-08-31 17:18:36 Re: archive status ".ready" files may be created too early