Re: ToDo: show size of partitioned table

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, mathias(at)brossard(dot)org, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: ToDo: show size of partitioned table
Date: 2018-11-22 11:42:14
Message-ID: CAFj8pRBJetR9tk5cNGt5mLewxwB4gXK0dm7L41fxUqBJ58h2wg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

čt 22. 11. 2018 v 1:51 odesílatel Michael Paquier <michael(at)paquier(dot)xyz>
napsal:

> On Wed, Nov 21, 2018 at 05:37:33PM +0100, Pavel Stehule wrote:
> > st 21. 11. 2018 v 17:21 odesílatel Alvaro Herrera <
> alvherre(at)2ndquadrant(dot)com>
> > napsal:
> >> Hmm, these tests are not going to work, because they have "pavel" in the
> >> expected output.
> >
> > I was blind, thank you for check
>
> +create table testtable_apple(logdate date);
> +create table testtable_orange(logdate date);
> +create index testtable_apple_index on testtable_apple(logdate);
> +create index testtable_orange_index on testtable_orange(logdate);
> There are already a bunch of partition relations with multiple levels
> created as part of the regression tests, so instead of creating more of
> those, I would suggest to test \dP and \dPt in create_table.sql, and
> \dPi in indexing.sql (please make sure to add tests for \dP with
> partitioned indexes as well).
>
> I think that you should really add the direct parent of a partition in
> at least the verbose output, now for multiple partition levels things
> are confusing in my opinion. For example with such a schema:
> CREATE TABLE parent_tab (id int) PARTITION BY RANGE (id);
> CREATE INDEX parent_index ON parent_tab (id);
> CREATE TABLE child_0_10 PARTITION OF parent_tab
> FOR VALUES FROM (0) TO (10);
> CREATE TABLE child_10_20 PARTITION OF parent_tab
> FOR VALUES FROM (10) TO (20);
> CREATE TABLE child_20_30 PARTITION OF parent_tab
> FOR VALUES FROM (20) TO (30);
> INSERT INTO parent_tab VALUES (generate_series(0,29));
> CREATE TABLE child_30_40 PARTITION OF parent_tab
> FOR VALUES FROM (30) TO (40)
> PARTITION BY RANGE(id);
> CREATE TABLE child_30_35 PARTITION OF child_30_40
> FOR VALUES FROM (30) TO (35);
> CREATE TABLE child_35_40 PARTITION OF child_30_40
> FOR VALUES FROM (35) TO (40);
> INSERT INTO parent_tab VALUES (generate_series(30,39));
>
> Then with \dP+ I got that:
> =# \dP+
> List of partitioned relations
> Schema | Name | Owner | Size | Description
> --------+-------------+--------+--------+-------------
> public | child_30_40 | ioltas | 48 kB |
> public | parent_tab | ioltas | 120 kB |
> (2 rows)
> Showing the parent partition looks like a pretty important to me as I
> would expect multi-level partitions to be a frequent case (perhaps it
> should show up as well in the non-verbose output?). The field should be
> NULL if the relation is the top of the tree.
>
>
it looks like bug for me much more.

your example - on my comp

List of relations
+--------+-------------+-------------------+-------+------------+-------------+
| Schema | Name | Type | Owner | Size |
Description |
+--------+-------------+-------------------+-------+------------+-------------+
| public | child_0_10 | table | pavel | 8192 bytes
| |
| public | child_10_20 | table | pavel | 8192 bytes
| |
| public | child_20_30 | table | pavel | 8192 bytes
| |
| public | child_30_35 | table | pavel | 8192 bytes
| |
| public | child_30_40 | partitioned table | pavel | 0 bytes
| |
| public | child_35_40 | table | pavel | 8192 bytes
| |
| public | parent_tab | partitioned table | pavel | 0 bytes
| |
+--------+-------------+-------------------+-------+------------+-------------+
(7 rows)

there is about 5x 8KB data .. 40KB

But in views I got

List of partitioned tables
+--------+-------------+-------+-------+-------------+
| Schema | Name | Owner | Size | Description |
+--------+-------------+-------+-------+-------------+
| public | child_30_40 | pavel | 16 kB | |
| public | parent_tab | pavel | 40 kB | |
+--------+-------------+-------+-------+-------------+
(2 rows)

there is 16KB more, what is really messy.

I think so most correct is removing child_30_40 from the report.

test=# SELECT n.nspname as "Schema",
c.relname as "Name",
pg_catalog.pg_get_userbyid(c.relowner) as "Owner",
(SELECT pg_catalog.pg_size_pretty(sum(pg_catalog.pg_table_size(relid)))
FROM pg_catalog.pg_partition_tree(c.oid)) AS "Size",
pg_catalog.obj_description(c.oid, 'pg_class') as "Description"
FROM pg_catalog.pg_class c
LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
WHERE c.relkind IN ('p') and not c.relispartition
AND n.nspname <> 'pg_catalog'
AND n.nspname <> 'information_schema'
AND n.nspname !~ '^pg_toast'
AND pg_catalog.pg_table_is_visible(c.oid)
ORDER BY 1,2;
+--------+------------+-------+-------+-------------+
| Schema | Name | Owner | Size | Description |
+--------+------------+-------+-------+-------------+
| public | parent_tab | pavel | 40 kB | |
+--------+------------+-------+-------+-------------+
(1 row)

I afraid of unreadable result if we allow overlap in report. I think so can
be strange if some disk space will be reported 2x or more times in one
report. Unfortunately It means so some information will be hidden. In this
moment I prefer readability and simple meaning.

I am not strong in this topics. Another possibility is show parent (this
should be displayed every time, without it it is messy).

This query is much more complex, but the result is more informative

SELECT n.nspname as "Schema",
c.relname as "Name",
n2.nspname as "Parent schema",
c2.relname as "Parent name",
pg_catalog.pg_get_userbyid(c.relowner) as "Owner",
s.max as "Hiearchy deep",
s.size as "Size",
pg_catalog.obj_description(c.oid, 'pg_class') as "Description"
FROM pg_catalog.pg_class c
LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
LEFT JOIN pg_catalog.pg_inherits i ON c.oid = i.inhrelid
LEFT JOIN pg_catalog.pg_class c2 ON c2.oid = i.inhparent
LEFT JOIN pg_catalog.pg_namespace n2 ON n2.oid = c2.relnamespace,
LATERAL (SELECT max(level),
pg_catalog.pg_size_pretty(sum(pg_catalog.pg_table_size(relid))) as size
FROM pg_catalog.pg_partition_tree(c.oid)) s
WHERE c.relkind IN ('p')
AND n.nspname <> 'pg_catalog'
AND n.nspname <> 'information_schema'
AND n.nspname !~ '^pg_toast'
AND pg_catalog.pg_table_is_visible(c.oid)
ORDER BY 1,2;

+--------+-------------+---------------+-------------+-------+---------------+-------+-------------+
| Schema | Name | Parent schema | Parent name | Owner | Hiearchy
deep | Size | Description |
+--------+-------------+---------------+-------------+-------+---------------+-------+-------------+
| public | child_30_40 | public | parent_tab | pavel |
1 | 16 kB | |
| public | parent_tab | | | pavel |
2 | 40 kB | |
+--------+-------------+---------------+-------------+-------+---------------+-------+-------------+
(2 rows)

Still I prefer to not show nested partitioned tables for simplicity,
readability reasons. Displaying nested objects in one table doesn't look
like good idea for me. But I am ready to accept different common opinion.

Still do you think so variant with parent should be preferred?

> Again, with the previous schema:
> =# \dPi *idx
> List of partitioned indexes
> Schema | Name | Owner | Table
> --------+--------------------+--------+-------------
> public | child_30_40_id_idx | ioltas | child_30_40
> (1 row)
> =# \dP *idx
> Did not find any partitioned relations named "*idx"
> I would have expected in the second case to have the partitioned
> *relations* showing up in the output, and a relation can be an index as
> well if the pattern matches.
>

I think so it is correct - I don't would to see the index here, because
index size is calculated by total_relation_size already.

Here my position is strong. \dP for me doesn't mean "tables or indexes" -
it means "partition tables with total relation size". I don't see any sense
to show tables and indexes in one report.

Regards

Pavel

> Could you please address those problems first? The basic shape of the
> patch with the three new sub-commands is fine I think, so we can go
> ahead with that, but the two problems reported are blockers in my
> opinion.
>
--
> Michael
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2018-11-22 11:44:31 Re: New function pg_stat_statements_reset_query() to reset statistics of a specific query
Previous Message Haozhou Wang 2018-11-22 11:25:37 Re: Control your disk usage in PG: Introduction to Disk Quota Extension