Re: many backends hang on MultiXactOffsetSLRU

From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: James Pang <jamespang886(at)gmail(dot)com>
Cc: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: many backends hang on MultiXactOffsetSLRU
Date: 2024-09-10 08:13:51
Message-ID: 202409100813.tq5c4aqhac7o@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 2024-Sep-10, James Pang wrote:

> Hi experts,
> we have a Postgresql v14.8 database, almost thousands of backends hang
> on MultiXactOffsetSLRU at the same time, all of these sessions running same
> query "SELECT ....", from OS and postgresql slow log, we found all of these
> query on "BIND" stage.
> LOG: duration: 36631.688 ms bind S_813: SELECT
> LOG: duration: 36859.786 ms bind S_1111: SELECT
> LOG: duration: 35868.148 ms bind <unnamed>: SELECT
> LOG: duration: 36906.471 ms bind <unnamed>: SELECT
> LOG: duration: 35955.489 ms bind <unnamed>: SELECT
> LOG: duration: 36833.510 ms bind <unnamed>: SELECT
> LOG: duration: 36839.535 ms bind S_1219: SELECT
> ...
>
> this database hang on MultiXactOffsetSLRU and MultiXactOffsetBuffer long
> time.
>
> could you direct me why they are hanging on 'BIND‘ stage with
> MultiXactOffsetSLRU ?

Very likely, it's related to this problem
[1] https://thebuild.com/blog/2023/01/18/a-foreign-key-pathology-to-avoid/

This is caused by a suboptimal implementation of what we call SLRU,
which multixact uses underneath. For years, many people dodged this
problem by recompiling with a changed value for
NUM_MULTIXACTOFFSET_BUFFERS in src/include/access/multixact.h (it was
originally 8 buffers, which is very small); you'll need to do that in
all releases up to pg16. In pg17 this was improved[2] and you'll be
able to change the value in postgresql.conf, though the default already
being larger than the original (16 instead of 8), you may not need to.

[2] https://pgconf.in/files/presentations/2023/Dilip_Kumar_RareExtremelyChallengingPostgresPerformanceProblems.pdf
[3] https://www.pgevents.ca/events/pgconfdev2024/schedule/session/53-problem-in-postgresql-slru-and-how-we-are-optimizing-it/

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"La victoria es para quien se atreve a estar solo"

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Alvaro Herrera 2024-09-10 08:15:42 Re: many backends hang on MultiXactOffsetSLRU
Previous Message Amine Tengilimoglu 2024-09-10 08:12:00 Re: many backends hang on MultiXactOffsetSLRU