Re: Postgres is not able to handle more than 4k tables!?

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Postgres is not able to handle more than 4k tables!?
Date: 2020-07-09 16:49:26
Message-ID: f8cdadbc-6c6b-b4ea-6896-5f1f4dd81778@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 09.07.2020 19:19, Nikolay Samokhvalov wrote:
> Hi Konstantin, a silly question: do you consider the workload you have
> as well-optimized? Can it be optimized further? Reading this thread I
> have a strong feeling that a very basic set of regular optimization
> actions is missing here (or not explained): query analysis and
> optimization based on pg_stat_statements (and, maybe pg_stat_kcache),
> some method to analyze the state of the server in general, resource
> consumption, etc.
>
> Do you have some monitoring that covers pg_stat_statements?
>
> Before looking under the hood, I would use multiple pg_stat_statements
> snapshots (can be analyzed using, say, postgres-checkup or pgCenter)
> to understand the workload and identify the heaviest queries -- first
> of all, in terms of total_time, calls, shared buffers reads/hits,
> temporary files generation. Which query groups are Top-N in each
> category, have you looked at it?
>
> You mentioned some crazy numbers for the planning time, but why not to
> analyze the picture holistically and see the overall numbers? Those
> queries that have increased planning time, what their part of
> total_time, on the overall picture, in %? (Unfortunately, we cannot
> see Top-N by planning time in pg_stat_statements till PG13, but it
> doesn't mean that we cannot have some good understanding of overall
> picture today, it just requires more work).
>
> If workload analysis & optimization was done holistically already, or
> not possible due to some reason — pardon me. But if not and if your
> primary goal is to improve this particular setup ASAP, then the topic
> could be started in the -performance mailing list first, discussing
> the workload and its aspects, and only after it's done, raised in
> -hackers. No?

Certainly, both we and customer has made workload analysis & optimization.
It is not a problem of particular queries, bad plans, resource
exhaustion,...

Unfortunately there many scenarios when Postgres demonstrates not
gradual degrade of performance with increasing workload,
but "snow avalanche" whennegative feedback cause very fastparalysis of
the system.

This case is just one if this scenarios. It is hard to say for sure what
triggers the avalanche... Long living transaction, huge number of tables,
aggressive autovacuum settings... But there is cascade of negative
events which cause system which normally function for months to stop
working at all.

In this particular case we have the following chain:

- long living transaction cause autovacuum to send a lot of invalidation
message
- this messages cause overflow of invalidation message queues, forcing
backens to invalidate their caches and reload from catalog.
- too small value of fastpath lock cache cause many concurrent accesses
to shared lock hash
- contention for LW-lock caused by small number of lock partition cause
starvation

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2020-07-09 16:56:27 Re: some more pg_dump refactoring
Previous Message Stephen Frost 2020-07-09 16:47:35 Re: Postgres is not able to handle more than 4k tables!?