Re: Performance of information_schema with many schemata and tables

From: Ulf Lohbrügge <ulf(dot)lohbruegge(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Performance of information_schema with many schemata and tables
Date: 2017-06-28 09:38:14
Message-ID: CABZYQRK6i+9jedfoQ+7aia+0c_ox0d=qV9_2p+PWv99Tegt95Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Nope, I didn't try that yet. But I don't have the impression that
reindexing the indexes in information_schema will help. The table
information_schema.tables consists of the following indexes:

"pg_class_oid_index" UNIQUE, btree (oid)
"pg_class_relname_nsp_index" UNIQUE, btree (relname, relnamespace)
"pg_class_tblspc_relfilenode_index" btree (reltablespace, relfilenode)

The costly sequence scan in question on pg_class happens with the following
WHERE clause:

WHERE (c.relkind = ANY (ARRAY['r'::"char", 'v'::"char", 'f'::"char"])) AND
NOT pg_is_other_temp_schema(nc.oid) AND (pg_has_role(c.relowner,
'USAGE'::text) OR has_table_privilege(c.oid, 'SELECT, INSERT, UPDATE,
DELETE, TRUNCATE, REFERENCES, TRIGGER'::text) OR
has_any_column_privilege(c.oid,
'SELECT, INSERT, UPDATE, REFERENCES'::text));

Besides pg_class_oid_index none of the referenced columns is indexed. I
tried to add an index on relowner but didn't succeed because the column is
used in the function call pg_has_role and the query is still forced to do a
sequence scan.

Regards,
Ulf

2017-06-28 3:31 GMT+02:00 Pritam Baral <pritam(at)pritambaral(dot)com>:

> On Wednesday 28 June 2017 05:27 AM, Ulf Lohbrügge wrote:
> > Hi all,
> >
> > we use schemata to separate our customers in a multi-tenant setup
> (9.5.7, Debian stable). Each tenant is managed in his own schema with all
> the tables that only he can access. All tables in all schemata are the same
> in terms of their DDL: Every tenant uses e.g. his own table 'address'. We
> currently manage around 1200 schemata (i.e. tenants) on one cluster. Every
> schema consists currently of ~200 tables - so we end up with ~240000 tables
> plus constraints, indexes, sequences et al.
> >
> > Our current approach is quite nice in terms of data privacy because
> every tenant is isolated from all other tenants. A tenant uses his own user
> that gives him only access to the corresponding schema. Performance is
> great for us - we didn't expect Postgres to scale so well!
> >
> > But performance is pretty bad when we query things in the
> information_schema:
> >
> > SELECT
> > *
> > FROM information_schema.tables
> > WHERE table_schema = 'foo'
> > AND table_name = 'bar';``
> >
> > Above query results in a large sequence scan with a filter that removes
> 1305161 rows:
> >
> >
>
> QUERY PLAN
> > ------------------------------------------------------------
> ------------------------------------------------------------
> ------------------------------------------------------------
> ------------------------------------------------------------
> ------------------------------------------------------------
> -------------------------------------------------------
> > Nested Loop Left Join (cost=0.70..101170.18 rows=3 width=265) (actual
> time=383.505..383.505 rows=0 loops=1)
> > -> Nested Loop (cost=0.00..101144.65 rows=3 width=141) (actual
> time=383.504..383.504 rows=0 loops=1)
> > Join Filter: (nc.oid = c.relnamespace)
> > -> Seq Scan on pg_class c (cost=0.00..101023.01 rows=867
> width=77) (actual time=383.502..383.502 rows=0 loops=1)
> > Filter: ((relkind = ANY ('{r,v,f}'::"char"[])) AND
> (((relname)::information_schema.sql_identifier)::text = 'bar'::text) AND
> (pg_has_role(relowner, 'USAGE'::text) OR has_table_privilege(oid, 'SELECT,
> INSERT, UPDATE, DELETE, TRUNCATE, REFERENCES, TRIGGER'::text) OR
> has_any_column_privilege(oid, 'SELECT, INSERT, UPDATE, REFERENCES'::text)))
> > Rows Removed by Filter: 1305161
> > -> Materialize (cost=0.00..56.62 rows=5 width=68) (never
> executed)
> > -> Seq Scan on pg_namespace nc (cost=0.00..56.60 rows=5
> width=68) (never executed)
> > Filter: ((NOT pg_is_other_temp_schema(oid)) AND
> (((nspname)::information_schema.sql_identifier)::text = 'foo'::text))
> > -> Nested Loop (cost=0.70..8.43 rows=1 width=132) (never executed)
> > -> Index Scan using pg_type_oid_index on pg_type t
> (cost=0.42..8.12 rows=1 width=72) (never executed)
> > Index Cond: (c.reloftype = oid)
> > -> Index Scan using pg_namespace_oid_index on pg_namespace nt
> (cost=0.28..0.30 rows=1 width=68) (never executed)
> > Index Cond: (oid = t.typnamespace)
> > Planning time: 0.624 ms
> > Execution time: 383.784 ms
> > (16 rows)
> >
> > We noticed the degraded performance first when using the psql cli.
> Pressing tab after beginning a WHERE clause results in a query against the
> information_schema which is pretty slow and ends in "lag" when trying to
> enter queries.
> >
> > We also use Flyway (https://flywaydb.org/) to handle our database
> migrations. Unfortunately Flyway is querying the information_schema to
> check if specific tables exist (I guess this is one of the reasons
> information_schema exists) and therefore vastly slows down the migration of
> our tenants. Our last migration run on all tenants (schemata) almost took
> 2h because the above query is executed multiple times per tenant. The
> migration run consisted of multiple sql files to be executed and triggered
> more than 10 queries on information_schema per tenant.
> >
> > I don't think that Flyway is to blame because querying the
> information_schema should be a fast operation (and was fast for us when we
> had less schemata). I tried to speedup querying pg_class by adding indexes
> (after enabling allow_system_table_mods) but didn't succeed. The function
> call 'pg_has_role' is probably not easy to optimize.
> >
> > Postgres is really doing a great job to handle those many schemata and
> tables but doesn't scale well when querying information_schema. I actually
> don't want to change my current multi-tenant setup (one schema per tenant)
> as it is working great but the slow information_schema is killing our
> deployments.
> >
> > Are there any other options besides switching from one-schema-per-tenant-approach?
> Any help is greatly appreciated!
>
> Have you tried a `REINDEX SYSTEM <dbname>`?
>
> >
> > Regards,
> > Ulf
>
> --
> #!/usr/bin/env regards
> Chhatoi Pritam Baral
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Gerardo Herzig 2017-06-28 13:08:54 Re:
Previous Message Pavel Stehule 2017-06-28 07:43:25 Re: