Re: Extended Statistics set/restore/clear functions.

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Cc: Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, Tender Wang <tndrwang(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Extended Statistics set/restore/clear functions.
Date: 2026-01-28 04:34:12
Message-ID: aXmRxBsfNT4Nn8mq@paquier.xyz
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 28, 2026 at 11:46:04AM +0900, Michael Paquier wrote:
> I'll go fix that now with some tests to cover things, after extracting
> the relevant portion of the code from v32-0002.

I have begun putting my head on the MCV part, and found what looks
like a memory overread by injecting buggy values for
most_common_val_nulls. Quick example:
create table test (name text);
CREATE STATISTICS test_stat_mcv_exprs (mcv)
ON lower(name), upper(name)
FROM test;
-- 4 elements in total in most_common_val_nulls, logic reads 8,
-- reads past 4 of them.
SELECT pg_catalog.pg_restore_extended_stats('schemaname', 'public',
'relname', 'test',
'statistics_schemaname', 'public',
'statistics_name', 'test_stat_mcv_exprs',
'inherited', false,
'most_common_vals', '{{four,FOUR},{one,ONE},{tre,TRE},{two,TWO}}'::text[],
'most_common_val_nulls', '{{f},{f},{f},{f}}'::boolean[],
'most_common_freqs', '{0.25,0.25,0.25,0.25}'::double precision[],
'most_common_base_freqs', '{0.0625,0.0625,0.0625,0.625}'::double precision[]);

The boundary checks for the most_common_freqs and
most_common_base_freqs are OK: these should be 1-dimension, with a
number of elements matching the number of items in most_common_vals.
As far as I can see, most_common_vals is also OK, we check after a
2-dimension array made of N arrays with a number of elements matching
with the object definition.

most_common_val_nulls is problematic: we check that it is a
2-dimension array, we also check that its number of internal arrays
match with the number of elements in most_common_vals. However, we do
*not* check that the internal arrays have a number of items matching
with the number of items in most_common_vals. In this example,
{{f},{f},{f},{f}} is too short, {{f,f},{f,f},{f,f},{f,f}} would be
right.

It means that we are missing more sanity checks in import_mcv(), from
what I can see. Without these checks, statext_mcv_import(), that
rebuilds the MCVItems would then read past the contents of nulls_arr
with an index larger than the number of items inserted (second "i"
loop based on "nitems").

Except for this issue, statext_mcv_import() and import_mcv(), which
are the heart of the logic for MCV values, seem pretty clean to me.

I have spent some time looking at the patch and fixed a couple of
issues, adjusting a few things. It would be a good idea to add more
tests for expressions in MCV definitions. I have added a new object in
the regression tests, we should also have some input validations and
checks like the two other kinds.

Could you look at the array bound issue please? Let's use the
attached as a base of work for now, this is what's standing now at the
top of my dev branch for the review of the MCV patch.
--
Michael

Attachment Content-Type Size
v33-0001-Add-support-for-mcv-in-pg_restore_extended_stats.patch text/x-diff 45.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2026-01-28 04:49:58 Re: Extended Statistics set/restore/clear functions.
Previous Message Amit Kapila 2026-01-28 04:33:51 Re: pgsql: Prevent invalidation of newly synced replication slots.