Re: PATCH: Attempt to make dbsize a bit more consistent

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: gkokolatos(at)pm(dot)me
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Soumyadeep Chakraborty <soumyadeep2007(at)gmail(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, David Zhang <david(dot)zhang(at)highgo(dot)ca>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: PATCH: Attempt to make dbsize a bit more consistent
Date: 2021-03-15 06:10:59
Message-ID: YE76cyoTVZxaJiP9@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 24, 2021 at 02:35:51PM +0000, gkokolatos(at)pm(dot)me wrote:
> Now with attachment. Apologies for the chatter.

The patch has no documentation for the two new functions, so it is a
bit hard to understand what is the value brought here, and what is the
goal wanted just by reading the patch, except that this switches the
size reported for views to NULL instead of zero bytes by reading the
regression tests.

Please note that reporting zero is not wrong for views IMO as these
have no physical storage, so, while it is possible to argue that a
size of zero could imply that this relation size could not be zero, it
will never change, so I'd rather let this behavior as it is. I
would bet, additionally, that this could break existing use cases.
Reporting NULL, on the contrary, as your patch does, makes things
worse in some ways because that's what pg_relation_size would report
when a relation does not exist anymore. Imagine for example the case
of a DROP TABLE run in parallel of a scan of pg_class using
pg_relation_size(). So it becomes impossible to know if a relation
has been removed or not. This joins some points raised by Soumyadeep
upthread.

Anyway, as mentioned by other people upthread, I am not really
convinced either that we should have more flavors of size functions,
particularly depending on the relkind as this would be confusing for
the end-user. pg_relation_size() can already do this job for all
relkinds that use segment files, so it should still be able to hold
its ground and maintain a consistent behavior with what it does
currently.

+static int64
+calculate_table_fork_size(Relation rel, ForkNumber forkNum)
+{
+ return (int64)table_relation_size(rel, forkNum);
+}
Why isn't that just unsigned, like table_relation_size()? This is an
internal function so it does not matter to changes its signature, but
I think that you had better do a cast at a higher level instead.

for (int i = 0; i < MAX_FORKNUM; i++)
- nblocks += smgrnblocks(rel->rd_smgr, i);
+ nblocks += smgrexists(rel->rd_smgr, i)
+ ? smgrnblocks(rel->rd_smgr, i)
+ : 0;
Is that actually the part for views? Why is that done this way?

+ if (rel->rd_rel->relkind == RELKIND_RELATION ||
+ rel->rd_rel->relkind == RELKIND_TOASTVALUE ||
+ rel->rd_rel->relkind == RELKIND_MATVIEW)
+ size = calculate_table_fork_size(rel,
+ forkname_to_number(text_to_cstring(forkName)));
+ else if (rel->rd_rel->relkind == RELKIND_INDEX)
+ size = calculate_relation_size(&(rel->rd_node), rel->rd_backend,
+ forkname_to_number(text_to_cstring(forkName)));
+ else
+ {
+ relation_close(rel, AccessShareLock);
+ PG_RETURN_NULL();
+ }
The approach you are taking to bring some consistency in all that by
making the size estimations go through table AMs via
table_relation_size() is promising. However, this code breaks the
size estimation for sequences, which is not good. If attempting to
use an evaluation that's based on a table AM, shouldn't this code use
a check based on rd_tableam rather than the relkind then? We assume
that rd_tableam is set depending on the relkind in relcache.c, for
one.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-03-15 06:28:37 Re: Consider Parallelism While Planning For REFRESH MATERIALIZED VIEW
Previous Message Jaime Casanova 2021-03-15 06:05:11 Re: SQL-standard function body