Quick Links

Re: Split index and table statistics into different types of stats

From:	"Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>
To:	Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Split index and table statistics into different types of stats
Date:	2022-11-14 10:36:52
Message-ID:	06517d27-7a81-3b60-60c4-9a92c6498981@gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

On 11/4/22 9:51 PM, Melanie Plageman wrote:
> Hi Bertrand,
>
> I'm glad you are working on this.
>
> I had a few thoughts/ideas
>

Thanks for looking at it!

> It seems better to have all of the counts in the various stats structs
> not be prefixed with n_, i_, t_
>
> typedef struct PgStat_StatDBEntry
> {
> ...
> PgStat_Counter n_blocks_fetched;
> PgStat_Counter n_blocks_hit;
> PgStat_Counter n_tuples_returned;
> PgStat_Counter n_tuples_fetched;
> ...
>
> I've attached a patch (0002) to change this in case you are interested
> in making such a change

I did not renamed initially the fields/columns to ease the review.

Indeed, I think we should go further than removing the n_, i_ and t_
prefixes so that the fields actually match the view's columns.

For example, currently pg_stat_all_indexes.idx_tup_read is linked to
"tuples_returned", so that it would make sense to rename
"tuples_returned" to "tuples_read" or even "tup_read" in the indexes
counters.

That's why I had in mind to do this fields/columns renaming into a
separate patch (once this one is committed), so that the current one
focus only on splitting the stats: what do you think?

> (I've attached all of my suggestions in patches
> along with your original patch so that cfbot still passes).
>
> On Wed, Nov 2, 2022 at 5:00 AM Drouvot, Bertrand
> <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
>> On 11/1/22 1:30 AM, Andres Freund wrote:
>>> On 2022-10-31 14:14:15 +0100, Drouvot, Bertrand wrote:
>>>> @@ -240,9 +293,23 @@ pg_stat_get_blocks_fetched(PG_FUNCTION_ARGS)
>>>> PG_RETURN_INT64(result);
>>>> }
>>>>
>>>> +Datum
>>>> +pg_stat_get_index_blocks_fetched(PG_FUNCTION_ARGS)
>>>> +{
>>>> + Oid relid = PG_GETARG_OID(0);
>>>> + int64 result;
>>>> + PgStat_StatIndEntry *indentry;
>>>> +
>>>> + if ((indentry = pgstat_fetch_stat_indentry(relid)) == NULL)
>>>> + result = 0;
>>>> + else
>>>> + result = (int64) (indentry->blocks_fetched);
>>>> +
>>>> + PG_RETURN_INT64(result);
>>>> +}
>>>
>>> We have so many copies of this by now - perhaps we first should deduplicate
>>> them somehow? Even if it's just a macro or such.
>>>
>>
>> Yeah good point, a new macro has been defined for the "int64" return
>> case in V3 attached.
>
> I looked for other opportunities to de-duplicate, but most of the functions
> that were added that are identical except the return type and
> PgStat_Kind are short enough that it doesn't make sense to make wrappers
> or macros.
>

Yeah, agree.

> I do think it makes sense to reorder the members of the two structs
> PgStat_IndexCounts and PgStat_TableCounts so that they have a common
> header. I've done that in the attached patch (0003).
>

That's a good idea, thanks! But for that we would need to have the same
fields names, means:

- Remove the prefixes (as you've done in 0002)
- And probably reduce the number of fields in the new
PgStat_RelationCounts that 003 is introducing (for example
tuples_returned should be excluded if we're going to rename it later on
to "tuples_read" for the indexes to map the
pg_stat_all_indexes.idx_tup_read column).

ISTM that we should do it in the "renaming" effort, after this patch is
committed.

> In the flush functions, I was also thinking it might be nice to use the
> same pattern as is used in [1] and [2] to add the counts together. It
> makes the lines a bit easier to read, IMO. If we remove the prefixes
> from the count fields, this works for many of the fields. I've attached
> a patch (0004) that does something like this, in case you wanted to go
> in this direction.

I like it too but same remarks as previously. I think it should be part
of the "renaming" effort.

>
> Since you have made new parallel functions for indexes and tables for
> many of the functions in pgstat_relation.c, perhaps it makes sense to
> split it into pgstat_table.c and pgstat_index.c?

Good point, thanks, I'll work on it.

>
> One question I had about the original code (not related to your
> refactor) is why the pending stats aren't memset in the flush functions
> after aggregating them into the shared stats.

Not sure I'm getting your point, do you think something is not right
with the flush functions?

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

In response to

Re: Split index and table statistics into different types of stats at 2022-11-04 20:51:46 from Melanie Plageman

Responses

Re: Split index and table statistics into different types of stats at 2022-11-15 09:48:40 from Drouvot, Bertrand

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Marina Polyakova	2022-11-14 10:56:06	Re: Fix order of checking ICU options in initdb and create database
Previous Message	Amit Kapila	2022-11-14 10:36:27	Re: hash_xlog_split_allocate_page: failed to acquire cleanup lock