Re: Discussion: Fast DB/Schema/Table disk size check in Postgresql

From: Hubert Zhang <hzhang(at)pivotal(dot)io>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Discussion: Fast DB/Schema/Table disk size check in Postgresql
Date: 2019-01-11 11:14:50
Message-ID: CAB0yre=bMiqFCWPpaNQE7TDS9ubhTTbp1hKN_u8f6cXVjHi6mA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks Stephen.
>
> In the largest portion of the system, when it comes to tables, indexes,
> and such, if there's a file 'X.1', you can be pretty darn sure that 'X'
> is 1GB in size. If there's a file 'X.245' then you can know that
> there's 245 files (X, X.1, X.2, X.3 ... X.244) that are 1GB in size.

Good idea. We could just add your logic in db_dir_size(). to teach both
pg_database_size_name() and pg_database_size_oid().
'X' maybe larger than 1GB with custom configuration.
So the size of the table could be calculated with two stats: size(X.245) +
245*size(X)

On Sun, Jan 6, 2019 at 3:52 AM Stephen Frost <sfrost(at)snowman(dot)net> wrote:

> Greetings,
>
> * Hubert Zhang (hzhang(at)pivotal(dot)io) wrote:
> > For very large databases, the dbsize function `pg_database_size_name()`
> > etc. could be quite slow, since it needs to call `stat()` system call on
> > every file in the target database.
>
> I agree, it'd be nice to improve this.
>
> > We proposed a new solution to accelerate these dbsize functions which
> check
> > the disk usage of database/schema/table. The new functions avoid to call
> > `stat()` system call, and instead store the disk usage of database
> objects
> > in user tables(called diskquota.xxx_size, xxx could be db/schema/table).
> > Checking the size of database 'postgres' could be converted to the SQL
> > query `select size from diskquota.db_size where name = `postgres``.
>
> This seems like an awful lot of work though.
>
> I'd ask a different question- why do we need to stat() every file?
>
> In the largest portion of the system, when it comes to tables, indexes,
> and such, if there's a file 'X.1', you can be pretty darn sure that 'X'
> is 1GB in size. If there's a file 'X.245' then you can know that
> there's 245 files (X, X.1, X.2, X.3 ... X.244) that are 1GB in size.
>
> Maybe we should teach pg_database_size_name() about that?
>
> Thanks!
>
> Stephen
>

--
Thanks

Hubert Zhang

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Khandekar 2019-01-11 11:23:47 Re: Displaying and dumping of table access methods
Previous Message Etsuro Fujita 2019-01-11 11:10:44 Re: Query with high planning time at version 11.1 compared versions 10.5 and 11.0