Re: Discussion: Fast DB/Schema/Table disk size check in Postgresql

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Hubert Zhang <hzhang(at)pivotal(dot)io>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Discussion: Fast DB/Schema/Table disk size check in Postgresql
Date: 2019-01-05 19:52:54
Message-ID: 20190105195254.GU2528@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

* Hubert Zhang (hzhang(at)pivotal(dot)io) wrote:
> For very large databases, the dbsize function `pg_database_size_name()`
> etc. could be quite slow, since it needs to call `stat()` system call on
> every file in the target database.

I agree, it'd be nice to improve this.

> We proposed a new solution to accelerate these dbsize functions which check
> the disk usage of database/schema/table. The new functions avoid to call
> `stat()` system call, and instead store the disk usage of database objects
> in user tables(called diskquota.xxx_size, xxx could be db/schema/table).
> Checking the size of database 'postgres' could be converted to the SQL
> query `select size from diskquota.db_size where name = `postgres``.

This seems like an awful lot of work though.

I'd ask a different question- why do we need to stat() every file?

In the largest portion of the system, when it comes to tables, indexes,
and such, if there's a file 'X.1', you can be pretty darn sure that 'X'
is 1GB in size. If there's a file 'X.245' then you can know that
there's 245 files (X, X.1, X.2, X.3 ... X.244) that are 1GB in size.

Maybe we should teach pg_database_size_name() about that?

Thanks!

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2019-01-05 20:52:31 Re: jsonpath
Previous Message Stephen Frost 2019-01-05 19:41:14 Re: Grant documentation about "all tables"