Re: Postgres is not able to handle more than 4k tables!?

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Postgres is not able to handle more than 4k tables!?
Date: 2020-07-09 16:47:35
Message-ID: 20200709164735.GC12375@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

* Konstantin Knizhnik (k(dot)knizhnik(at)postgrespro(dot)ru) wrote:
> It makes me think about two possible optimizations:
>
> 1. Provide separate invalidation messages for relation metadata and its
> statistic.
> So update of statistic should not invalidate relation cache.
> The main problem with this proposal is that pg_class contains relpages and
> reltuples columns which conceptually are \ part of relation statistic
> but stored in relation cache. If relation statistic is updated, then most
> likely this fields are also changed. So we have to remove this relation
> from relation cache in any case.

I realize this is likely to go over like a lead balloon, but the churn
in pg_class from updating reltuples/relpages has never seemed all that
great to me when just about everything else is so rarely changed, and
only through some user DDL action- and I agree that it seems like those
particular columns are more 'statistics' type of info and less info
about the definition of the relation. Other columns that do get changed
regularly are relfrozenxid and relminmxid. I wonder if it's possible to
move all of those elsewhere- perhaps some to the statistics tables as
you seem to be alluding to, and the others to $somewhereelse that is
dedicated to tracking that information which VACUUM is primarily
concerned with.

> 2. Remember in relation info XID of oldest active transaction at the moment
> of last autovacuum.
> At next autovacuum iteration we first of all compare this stored XID with
> current oldest active transaction XID
> and bypass vacuuming this relation if XID is not changed.
>
> Thoughts?

That sounds like an interesting optimization and I agree it'd be nice to
avoid the re-run of autovacuum when we can tell that there's not going
to be anything more we can do. As noted above, for my part, I think
it'd be nice to move that kind of ongoing maintenance/updates out of
pg_class, but just in general I agree with the idea to store that info
somewhere and wait until there's actually been progress in the global
xmin before re-running a vacuum on a table. If we can do that somewhere
outside of pg_class, I think that'd be better, but if no one is up for
that kind of a shake-up, then maybe we just put it in pg_class and deal
with the churn there.

> >>So, that's really the core of your problem. We don't promise that
> >>you can run several thousand backends at once. Usually it's recommended
> >>that you stick a connection pooler in front of a server with (at most)
> >>a few hundred backends.
> >Sure, but that doesn't mean things should completely fall over when we
> >do get up to larger numbers of backends, which is definitely pretty
> >common in larger systems. I'm pretty sure we all agree that using a
> >connection pooler is recommended, but if there's things we can do to
> >make the system work at least a bit better when folks do use lots of
> >connections, provided we don't materially damage other cases, that's
> >probably worthwhile.
>
> I also think that Postgres performance should degrade gradually with
> increasing number
> of active backends. Actually further investigations of this particular case
> shows that such large number of
> database connections was caused by ... Postgres slowdown.
> During normal workflow number of active backends is few hundreds.
> But "invalidation storm" cause hangout of queries, so user application has
> to initiate more and more new connections to perform required actions.
> Yes, this may be not the best behavior of application in this case. At least
> it should first terminate current connection using pg_terminate_backend. I
> just want to notice that large number of backends was not the core of the
> problem.

Yeah, this is all getting back to the fact that we don't have an
acceptance criteria or anything like that, where we'd actually hold off
on new connections/queries being allowed in while other things are
happening. Of course, a connection pooler would address this (and you
could use one and have it still look exactly like PG, if you use, say,
pgbouncer in session-pooling mode, but then you need to have the
application drop/reconnect and not do its own connection pooling..), but
it'd be nice to have something in core for this.

> >Making them GUCs does seem like it's a few steps too far... but it'd be
> >nice if we could arrange to have values that don't result in the system
> >falling over with large numbers of backends and large numbers of tables.
> >To get a lot of backends, you'd have to set max_connections up pretty
> >high to begin with- perhaps we should contemplate allowing these values
> >to vary based on what max_connections is set to?
>
> I think that optimal value of number of lock partitions should depend not on
> number of connections
> but on number of available CPU cores and so expected level on concurrency.
> It is hard to propose some portable way to obtain this number.
> This is why I think that GUCs is better solution.

A GUC for 'number of CPUs' doesn't seem like a bad option to have. How
to make that work well may be challenging though.

> Certainly I realize that it is very dangerous parameter which should be
> changed with special care.
> Not only because of  MAX_SIMUL_LWLOCKS.

Sure.

> There are few places in Postgres when it tries to lock all partitions
> (deadlock detector, logical replication,...).
> If there very thousands of partitions, then such lock will be too expensive
> and we get yet another
> popular Postgres program: "deadlock detection storm" when due to high
> contention between backends lock can not be obtained
> in deadlock timeout and so initiate deadlock detection. Simultaneous
> deadlock detection performed by all backends
> (which tries to take ALL partitions locks) paralyze the system (TPS falls
> down to 0).
> Proposed patch for this problem was also rejected (once again - problem can
> be reproduced only of powerful server with large number of cores).

That does sound like something that would be good to improve on, though
I haven't looked at the proposed patch or read the associated thread, so
I'm not sure I can really comment on its rejection specifically.

Thanks,

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Konstantin Knizhnik 2020-07-09 16:49:26 Re: Postgres is not able to handle more than 4k tables!?
Previous Message Alexander Korotkov 2020-07-09 16:37:46 Re: Postgres is not able to handle more than 4k tables!?