Discussion: Fast DB/Schema/Table disk size check in Postgresql

From: Hubert Zhang <hzhang(at)pivotal(dot)io>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Discussion: Fast DB/Schema/Table disk size check in Postgresql
Date: 2018-12-18 07:44:51
Message-ID: CAB0yremwWyu+pWN8G3VEe6OpNHWYtrc9DkAgz=vzCR_9g9veXQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi all,

For very large databases, the dbsize function `pg_database_size_name()`
etc. could be quite slow, since it needs to call `stat()` system call on
every file in the target database.

We proposed a new solution to accelerate these dbsize functions which check
the disk usage of database/schema/table. The new functions avoid to call
`stat()` system call, and instead store the disk usage of database objects
in user tables(called diskquota.xxx_size, xxx could be db/schema/table).
Checking the size of database 'postgres' could be converted to the SQL
query `select size from diskquota.db_size where name = `postgres``.

To keep the diskquota.table_size up-to-date, we follow a passive way which
let a separate background worker process to refresh the model periodically
with a user defined delay. Compare with active way, which updates the table
size on-the-fly in executor, our method will not harm the OLTP performance,
all the table size update are done asynchronously. The background worker
process needs to detect the table size change, maintain the table size
model, and flush the table size into user table. Luckily, the first two
parts are already be solved in Diskquota extension.

We have introduced the diskquota extension
<https://github.com/greenplum-db/diskquota> in past. (link to diskquota wiki
<https://github.com/greenplum-db/diskquota/wiki/Postgresql-Diskquota-Design>
) Diskquota is an extension to control the disk usage for database
objects like schemas or roles. DBA could set the quota limit for a schema
or a role on a database. Then diskquota worker process will maintain the
diskquota model, which includes the current disk usage of each schema or
role in the database and the blacklist of schema and role whose quota limit
is reached. Loading data into tables, whose schema or role is in diskquota
blacklist, will be cancelled.

To support fast disk size check function, diskquota worker process need to
just add some logics to flush the diskquota model into user table
diskquota.table_size with less code change. Here is the algorithm change:
(for origin diskquota algorithm to refresh diskquota model refer to diskquota
design
<https://github.com/greenplum-db/diskquota/wiki/Postgresql-Diskquota-Design#design-of-diskquota>
)

The change to the original diskquota algorithm is described in bold font
below(detail link
<https://github.com/greenplum-db/diskquota/wiki/Postgresql-Diskquota-Design#lack-of-fast-database-usage-check-function>
):

1. Fetch the latest user defined quota limit by reading table
'diskquota.quota_config'.
2. Get active table list(table_oid and size) from Active Tables shared
memory. Table size is calculated by pg_total_relation_size(table_oid). At
diskquota model initialization stage (e.g. after restart database), it
will firstly read table size from table diskquota.table_size to rebuild the
table_size_map, schema_size_map and role_size_map directly. If table
diskquota.table_size is empty, then all the tables will be treated as
active tables. Quota Size Checker will fetch the table size of all the
tables directly.
3. Traverse user tables in pg_class:
1. If table is in active table list, calculate the delta of table
size change, update the corresponding table_size_map, namespace_size_map
and role_size_map, update tables diskquota.table_size,
diskquota.schema_size and diskquota.role_size in current database.
2. If table's schema change, move the quota from old schema to new
schema in namespace_size_map, update table diskquota.schema_size in
current database.
3. If table's owner change, move the quota from old owner to new
owner in role_size_map, update table diskquota.role_size in current
database.
4. Mark table is existed(not dropped) in table_size_map.
4. Traverse table_size_map and detect 'dropped' tables in step 3.4.
Reduce the quota from corresponding namespace_size_map and
role_size_map. update
tables diskquota.table_size, diskquota.schema_size and diskquota.role_size
in current database.
5. Traverse namespace in pg_namespace:
1. remove the dropped namespace from namespace_size_map.
2. compare the quota usage and quota limit for each namespace, put
the out-of-quota namespace into blacklist.
3. update table diskquota.schema_size in current database.
6. Traverse role in pg_role:
1. remove the dropped role from role_size_map.
2. compare the quota usage and quota limit for each role, put the
out-of-quota role into blacklist.
3. update table diskquota.role_size in current database.

Any comment on fast disk size check function is appreciate.
BTW, diskquota extension need to add some hook functions in Postgres. We
update our patch in commitfest/21/1883
<https://commitfest.postgresql.org/21/1883/>. There is no reviewer yet.
Please help to review this patch if you are interest in diskquota
extension. Thanks in advance!

--
Thanks

Hubert Zhang

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2018-12-18 07:45:30 Re: Should new partitions inherit their tablespace from their parent?
Previous Message Michael Paquier 2018-12-18 07:42:25 Re: Catalog views failed to show partitioned table information.