Re: PATCH: Attempt to make dbsize a bit more consistent

From: gkokolatos(at)pm(dot)me
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Soumyadeep Chakraborty <soumyadeep2007(at)gmail(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, David Zhang <david(dot)zhang(at)highgo(dot)ca>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Michael Paquier <michael(at)paquier(dot)xyz>
Subject: Re: PATCH: Attempt to make dbsize a bit more consistent
Date: 2021-02-24 14:35:51
Message-ID: eGUb2WFDwwkWZESWj_lvn2HXRWZCI4YYzYTORJK8iXiVzF1pbZcFKRKUMTCYrkCF-_CHl89VHbPB0nASLYSWCdQqBCTj9mA4d0Yg7eo0SuY=@pm.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Wednesday, February 24, 2021 3:34 PM, <gkokolatos(at)pm(dot)me> wrote:

>
>
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On Friday, February 19, 2021 4:57 PM, gkokolatos(at)pm(dot)me wrote:
>
> > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> > On Monday, February 1, 2021 1:18 PM, Masahiko Sawada sawada(dot)mshk(at)gmail(dot)com wrote:
> >
> > > On Thu, Nov 12, 2020 at 2:54 AM Soumyadeep Chakraborty
> > > soumyadeep2007(at)gmail(dot)com wrote:
> > >
> > > > Hey Georgios,
> > > > On Tue, Nov 10, 2020 at 6:20 AM gkokolatos(at)pm(dot)me wrote:
> > > >
> > > > > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> > > > > On Monday, November 9, 2020 7:50 PM, Soumyadeep Chakraborty soumyadeep2007(at)gmail(dot)com wrote:
> > > > >
> > > > > > Hey Georgios,
> > > > > > Thanks for looking for more avenues to invoke tableAM APIS! Please find
> > > > > > my review below:
> > > > >
> > > > > A great review Soumyadeep, it is much appreciated.
> > > > > Please remember to add yourself as a reviewer in the commitfest
> > > > > [https://commitfest.postgresql.org/30/2701/]
> > > >
> > > > Ah yes. Sorry, I forgot that!
> > > >
> > > > > > On Tue, Oct 13, 2020 at 6:28 AM gkokolatos(at)pm(dot)me wrote:
> > > > > >
> > > > > > 1.
> > > > > >
> > > > > > > /*
> > > > > > >
> > > > > > > - - heap size, including FSM and VM
> > > > > > > - - table size, including FSM and VM
> > > > > > > */
> > > > > > >
> > > > > >
> > > > > > We should not mention FSM and VM in dbsize.c at all as these are
> > > > > > heap AM specific. We can say:
> > > > > > table size, excluding TOAST relation
> > > > >
> > > > > Yeah, I was thinking that the notion that FSM and VM are still taken
> > > > > into account should be stated. We are iterating over ForkNumber
> > > > > after all.
> > > > > How about phrasing it as:
> > > > >
> > > > > - table size, including all implemented forks from the AM (e.g. FSM, VM)
> > > > > - excluding TOAST relations
> > > > >
> > > > > Thoughts?
> > > >
> > > > Yes, I was thinking along the same lines. The concept of a "fork" forced
> > > > should not be forced down into the tableAM. But that is a discussion for
> > > > another day. We can probably say:
> > > >
> > > > - table size, including all implemented forks from the AM (e.g. FSM, VM
> > > > - for the heap AM) excluding TOAST relations
> > > >
> > > > > > 2.
> > > > > >
> > > > > > > /*
> > > > > > >
> > > > > > > - Size of toast relation
> > > > > > > */
> > > > > > > if (OidIsValid(rel->rd_rel->reltoastrelid))
> > > > > > >
> > > > > > > - size += calculate_toast_table_size(rel->rd_rel->reltoastrelid);
> > > > > > >
> > > > > > > - {
> > > > > > >
> > > > > > > - Relation toastRel;
> > > > > > >
> > > > > > > -
> > > > > > > - toastRel = relation_open(rel->rd_rel->reltoastrelid, AccessShareLock);
> > > > > > >
> > > > > >
> > > > > > We can replace the OidIsValid check with relation_needs_toast_table()
> > > > > > and then have the OidIsValid() check in an Assert. Perhaps, that will
> > > > > > make things more readable.
> > > > >
> > > > > Please, allow me to kindly disagree.
> > > > > Relation is already open at this stage. Even create_toast_table(), the
> > > > > internal workhorse for creating toast relations, does check reltoastrelid
> > > > > to test if the relation is already toasted.
> > > > > Furthermore, we do call:
> > > > >
> > > > > - toastRel = relation_open(rel->rd_rel->reltoastrelid, AccessShareLock);
> > > > >
> > > > > and in order to avoid elog() errors underneath us, we ought to have
> > > > > verified the validity of reltoastrelid.
> > > > > In short, I think that the code in the proposal is not too unreadable
> > > > > nor that it breaks the coding patterns throughout the codebase.
> > > > > Am I too wrong?
> > > >
> > > > No not at all. The code in the proposal is indeed adhering to the
> > > > codebase. What I was going for here was to increase the usage of
> > > > relation_needs_toast_table(). What I meant was:
> > > > if (table_relation_needs_toast_table(rel))
> > > > {
> > > > if (!OidIsValid(rel->rd_rel->reltoastrelid))
> > > > {
> > > > elog(ERROR, <errmsg that toast table wasn't found>);
> > > > }
> > > > //size calculation here..
> > > > }
> > > > We want to error out if the toast table can't be found and the relation
> > > > is expected to have one, which the existing code doesn't handle.
> > > >
> > > > > > 3.
> > > > > >
> > > > > > > - if (rel->rd_rel->relkind == RELKIND_RELATION ||
> > > > > > > - rel->rd_rel->relkind == RELKIND_TOASTVALUE ||
> > > > > > > - rel->rd_rel->relkind == RELKIND_MATVIEW)
> > > > > > > - size = calculate_table_size(rel);
> > > > > > > - else
> > > > > > > - {
> > > > > > > - relation_close(rel, AccessShareLock);
> > > > > > > - PG_RETURN_NULL();
> > > > > > > - }
> > > > > >
> > > > > > This leads to behavioral changes:
> > > > > > I am talking about the case where one calls pg_table_size() on an index.
> > > > > > W/o your patch, it returns the size of the index. W/ your patch it
> > > > > > returns NULL. Judging by the documentation, this function should not
> > > > > > ideally apply to indexes but it does! I have a sinking feeling that lots
> > > > > > of users use it for this purpose, as there is no function to calculate
> > > > > > the size of a single specific index (except pg_relation_size()).
> > > > > > The same argument I have made above applies to sequences. Users may have
> > > > > > trial-and-errored their way into finding that pg_table_size() can tell
> > > > > > them the size of a specific index/sequence! I don't know how widespread
> > > > > > the use is in the user community, so IMO maybe we should be conservative
> > > > > > and not introduce this change? Alternatively, we could call out that
> > > > > > pg_table_size() is only for tables by throwing an error if anything
> > > > > > other than a table is passed in.
> > > > > > If we decide to preserve the existing behavior of the pg_table_size():
> > > > > > It seems that for things not backed by the tableAM (indexes and
> > > > > > sequences), they should still go through calculate_relation_size().
> > > > > > We can call table_relation_size() based on if relkind is
> > > > > > RELKIND_RELATION, RELKIND_TOASTVALUE or RELKIND_MATVIEW. Perhaps it
> > > > > > might be worthwhile making a new macro RELKIND_HAS_TABLE_STORAGE to
> > > > > > capture these three cases (See RELKIND_HAS_STORAGE). This also ensures
> > > > > > that we return 0 for things that don't qualify as RELKIND_HAS_STORAGE,
> > > > > > such as a partitioned table (Currently w/ the patch applied, we return
> > > > > > NULL for those cases, which is another behavior change)
> > > > >
> > > > > Excellent point. This is the discussion I was longing to have.
> > > > > I stand by the decision coded in the patch, that pg_table_size() should
> > > > > return NULL for other kinds of relations, such as indexes, sequences
> > > > > etc.
> > > > > It is a conscious decision based on the following:
> > > > >
> > > > > - Supported by the documentation, pg_table_size() applies to tables only.
> > > > > For other uses the higher-level functions pg_total_relation_size() or
> > > > > pg_relation_size() should be used.
> > > > >
> > > > > - Commit fa352d662e taught pg_relation_size() and friends to return NULL if the object doesn't exist. This makes perfect sense for the
> > > > > scenarios described in the commit:
> > > > > That avoids errors when the functions are used in queries like
> > > > > "SELECT pg_relation_size(oid) FROM pg_class",
> > > > > and a table is dropped concurrently.
> > > > >
> > > > >
> > > > > IMHO: It is more consistent to return NULL when the relation does exist
> > > > > OR it is not a table kind.
> > > > >
> > > > > - Returning 0 for things that do not have storage, is nonsensical because
> > > > > it implies that it can be NON zero at some point. Things that do not
> > > > > have storage have an unknown size.
> > > > >
> > > >
> > > > Fair. We will have to document the behavior change.
> > > >
> > > > > As far as for the argument that users might have trialed and errored
> > > > > their way into undocumented behaviour, I do not think it is strong
> > > > > enough to stop us from implementing the documented behaviour.
> > > >
> > > > Fair. I would strongly vote for having two additional functions
> > > > (pg_index_size() and pg_sequence_size()) to strongly signal our intent
> > > > of banning that kind of use of pg_table_size(). I think it would help
> > > > users a lot. It is not easy to find what function to call when you want
> > > > the size of a single index/sequence.
> > > >
> > > > > > 4.
> > > > > >
> > > > > > > @@ -3776,10 +3776,24 @@ listTables(const char *tabtypes, const char *pattern, bool verbose, bool showSys
> > > > > > > gettext_noop("Access Method"));
> > > > > > >
> > > > > > > /*
> > > > > > >
> > > > > > >
> > > > > > > - * As of PostgreSQL 14, do not use pg_table_size() for indexes and
> > > > > > >
> > > > > > >
> > > > > > > - * sequences as it does not behave sanely for those.
> > > > > > >
> > > > > > >
> > > > > > > - *
> > > > > > > * As of PostgreSQL 9.0, use pg_table_size() to show a more accurate
> > > > > > > * size of a table, including FSM, VM and TOAST tables.
> > > > > > > */
> > > > > > >
> > > > > > >
> > > > > > > - if (pset.sversion >= 90000)
> > > > > > >
> > > > > > >
> > > > > > > - if (pset.sversion >= 140000)
> > > > > > >
> > > > > > >
> > > > > > > - appendPQExpBuffer(&buf,
> > > > > > >
> > > > > > >
> > > > > > > - ",\\\\\\\\\\\\\\\\n CASE"
> > > > > > >
> > > > > > >
> > > > > > > - " WHEN c.relkind in ("CppAsString2(RELKIND_INDEX)", "
> > > > > > >
> > > > > > >
> > > > > > > - CppAsString2(RELKIND_PARTITIONED_INDEX)", "
> > > > > > >
> > > > > > >
> > > > > > > - CppAsString2(RELKIND_SEQUENCE)") THEN"
> > > > > > >
> > > > > > >
> > > > > > > - " pg_catalog.pg_size_pretty(pg_catalog.pg_relation_size(c.oid))"
> > > > > > >
> > > > > > >
> > > > > > > - " ELSE"
> > > > > > >
> > > > > > >
> > > > > > > - " pg_catalog.pg_size_pretty(pg_catalog.pg_table_size(c.oid))"
> > > > > > >
> > > > > > >
> > > > > > > - " END as \\\\\\\\\\\\\\\\"%s\\\\\\\\\\\\\\\\"",
> > > > > > >
> > > > > > >
> > > > > > > - gettext_noop("Size"));
> > > > > > >
> > > > > > >
> > > > > > > - else if (pset.sversion >= 90000)
> > > > > > > appendPQExpBuffer(&buf,
> > > > > > > ",\\\\\\\\\\\\\\\\n pg_catalog.pg_size_pretty(pg_catalog.pg_table_size(c.oid)) as \\\\\\\\\\\\\\\\"%s\\\\\\\\\\\\\\\\"",
> > > > > > > gettext_noop("Size"));
> > > > > > >
> > > > > > >
> > > > > >
> > > > > > Following on from point 3, if we decide to preserve the existing behavior,
> > > > > > we wouldn't need this change, as it would be internalized within
> > > > > > pg_table_size().
> > > > >
> > > > > We really should not decide to preserve the existing behaviour.
> > > > > I will reiterate my point: Returning 0 for things that do not have
> > > > > storage, implies that it can be NON zero at some point. We should not
> > > > > treat pg_table_size() as an alias for pg_relation_size().
> > > >
> > > > +1
> > > >
> > > > > > 4.
> > > > > >
> > > > > > > > - return size;
> > > > > > > >
> > > > > > > > - Assert(size < PG_INT64_MAX);
> > > > > > > >
> > > > > > > > -
> > > > > > > > - return (int64)size;
> > > > > > > > I assume that this change, and the other ones like that, aim to handle int64
> > > > > > > > overflow? Using the extra legroom of uint64 can still lead to an overflow,
> > > > > > > > however theoretical it may be. Wouldn't it be better to check for overflow
> > > > > > > > before adding to size rather than after the fact? Something along the lines of
> > > > > > > > checking for headroom left:
> > > > > > > > rel_size = table_relation_size(..);
> > > > > > > > if (rel_size > (PG_INT64_MAX - total_size))
> > > > > > > > < error codepath >
> > > > > > > >
> > > > > > > >
> > > > > > > > total_size += rel_size;
> > > > > > >
> > > > > > > Actually not, the intention is not to handle overflow. The table_relation_size() returns a uint64 and the calling function returns int64.
> > > > > > > The Assert() has been placed in order to denote that it is acknowledged that the two functions return different types. I was of the opinion that a run time check will not be needed as even the > smaller type can cover more than 9200 PetaByte tables.
> > > > > > > If we were to change anything, then I would prefer to change the signature of the pg_*_size family of functions to return uint64 instead.
> > > > > >
> > > > > > Changing the signature would be the ideal change for all of this here.
> > > > > > But Postgres does not have support for an unsigned 64bit integer (bigint
> > > > > > is signed). One would need to turn to extensions such as [1]. Thus, +1
> > > > > > to what Daniel said above.
> > > > >
> > > > > Apologies, I do not follow. Are you suggesting that we should
> > > > > introduce overflow tests?
> > > >
> > > > Yes, to introduce the same overflow test that Daniel suggested above as
> > > > returning a uint64 in PG does not really return a uint64 AFAIU (since
> > > > the pg_**size() functions all return bigint which is signed and there
> > > > is no uint64 user-facing type).
> > >
> > > Status update for a commitfest entry.
> > > This patch gets review comments and there was some discussion. It
> > > seems we're waiting for the patch update. So I've moved this patch to
> > > the next commitfest and set it to "Waiting on Author".
> >
> > Thank you for your patience.
> > Please find v3 attached. I will reset the status to 'Ready for review'.
>
> Version 3 of the patch broke the cfbot. Please find v4 attached.
>
> This patch requires a catalog version bump.

Now with attachment. Apologies for the chatter.

>
> Cheers,
> //Georgios -- https://www.vmware.com
>
> > > Regards,
> > > Masahiko Sawada
> > > EDB: https://www.enterprisedb.com/

Attachment Content-Type Size
v4-0001-Attempt-to-make-dbsize-a-bit-more-consistent.patch application/octet-stream 14.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Floris Van Nee 2021-02-24 14:44:30 non-HOT update not looking at FSM for large tuple update
Previous Message gkokolatos 2021-02-24 14:34:25 Re: PATCH: Attempt to make dbsize a bit more consistent