Re: Statistics Import and Export

From: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
To: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Subject: Re: Statistics Import and Export
Date: 2024-04-01 11:26:39
Message-ID: CAExHW5srxP5MOYD8brXxQqYfjWEUwyxVR0-rA505aN4UjH+KmA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Corey,

On Mon, Mar 25, 2024 at 3:38 PM Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
wrote:

> Hi Corey,
>
>
> On Sat, Mar 23, 2024 at 7:21 AM Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
> wrote:
>
>> v12 attached.
>>
>> 0001 -
>>
>>
> Some random comments
>
> +SELECT
> + format('SELECT pg_catalog.pg_set_attribute_stats( '
> + || 'relation => %L::regclass::oid, attname => %L::name, '
> + || 'inherited => %L::boolean, null_frac => %L::real, '
> + || 'avg_width => %L::integer, n_distinct => %L::real, '
> + || 'most_common_vals => %L::text, '
> + || 'most_common_freqs => %L::real[], '
> + || 'histogram_bounds => %L::text, '
> + || 'correlation => %L::real, '
> + || 'most_common_elems => %L::text, '
> + || 'most_common_elem_freqs => %L::real[], '
> + || 'elem_count_histogram => %L::real[], '
> + || 'range_length_histogram => %L::text, '
> + || 'range_empty_frac => %L::real, '
> + || 'range_bounds_histogram => %L::text) ',
> + 'stats_export_import.' || s.tablename || '_clone', s.attname,
> + s.inherited, s.null_frac,
> + s.avg_width, s.n_distinct,
> + s.most_common_vals, s.most_common_freqs, s.histogram_bounds,
> + s.correlation, s.most_common_elems, s.most_common_elem_freqs,
> + s.elem_count_histogram, s.range_length_histogram,
> + s.range_empty_frac, s.range_bounds_histogram)
> +FROM pg_catalog.pg_stats AS s
> +WHERE s.schemaname = 'stats_export_import'
> +AND s.tablename IN ('test', 'is_odd')
> +\gexec
>
> Why do we need to construct the command and execute? Can we instead
> execute the function directly? That would also avoid ECHO magic.
>

Addressed in v15

>
> + <table id="functions-admin-statsimport">
> + <title>Database Object Statistics Import Functions</title>
> + <tgroup cols="1">
> + <thead>
> + <row>
> + <entry role="func_table_entry"><para role="func_signature">
> + Function
> + </para>
> + <para>
> + Description
> + </para></entry>
> + </row>
> + </thead>
>
> COMMENT: The functions throw many validation errors. Do we want to list
> the acceptable/unacceptable input values in the documentation corresponding
> to those? I don't expect one line per argument validation. Something like
> "these, these and these arguments can not be NULL" or "both arguments in
> each of the pairs x and y, a and b, and c and d should be non-NULL or NULL
> respectively".
>

Addressed in v15.

> + /* Statistics are dependent on the definition, not the data */
> + /* Views don't have stats */
> + if ((tbinfo->dobj.dump & DUMP_COMPONENT_STATISTICS) &&
> + (tbinfo->relkind == RELKIND_VIEW))
> + dumpRelationStats(fout, &tbinfo->dobj, reltypename,
> + tbinfo->dobj.dumpId);
> +
>
> Statistics are about data. Whenever pg_dump dumps some filtered data, the
> statistics collected for the whole table are uselss. We should avoide
> dumping
> statistics in such a case. E.g. when only schema is dumped what good is
> statistics? Similarly the statistics on a partitioned table may not be
> useful
> if some its partitions are not dumped. Said that dumping statistics on
> foreign
> table makes sense since they do not contain data but the statistics still
> makes sense.
>

Dumping statistics without data is required for pg_upgrade. This is being
discussed in the same thread. But I don't see some of the suggestions e.g.
using binary-mode switch being used in v15.

Also, should we handle sequences, composite types the same way? THe latter
is probably not dumped, but in case.

>
> Whether or not I pass --no-statistics, there is no difference in the dump
> output. Am I missing something?
> $ pg_dump -d postgres > /tmp/dump_no_arguments.out
> $ pg_dump -d postgres --no-statistics > /tmp/dump_no_statistics.out
> $ diff /tmp/dump_no_arguments.out /tmp/dump_no_statistics.out
> $
>
> IIUC, pg_dump includes statistics by default. That means all our pg_dump
> related tests will have statistics output by default. That's good since the
> functionality will always be tested. 1. We need additional tests to ensure
> that the statistics is installed after restore. 2. Some of those tests
> compare dumps before and after restore. In case the statistics is changed
> because of auto-analyze happening post-restore, these tests will fail.
>

Fixed.

Thanks for addressing those comments.

--
Best Wishes,
Ashutosh Bapat

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2024-04-01 11:30:03 Re: Synchronizing slots from primary to standby
Previous Message Pavel Borisov 2024-04-01 11:13:05 Re: Fix parameters order for relation_copy_for_cluster