Quick Links

Re: Extended Statistics set/restore/clear functions.

From:	Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
To:	Michael Paquier <michael(at)paquier(dot)xyz>
Cc:	jian he <jian(dot)universality(at)gmail(dot)com>, Tomas Vondra <tomas(at)vondra(dot)me>, pgsql-hackers(at)lists(dot)postgresql(dot)org, tgl(at)sss(dot)pgh(dot)pa(dot)us
Subject:	Re: Extended Statistics set/restore/clear functions.
Date:	2025-11-18 04:39:50
Message-ID:	CADkLM=cT2rqtw12JX6+hD4gL=wSKR=jt040QGGsurZiqpZ6ZLQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

>
> > But, if we don't care about the order of the combinations, I also don't
> > think we need to expose the functions at all. We know exactly how many
> > combinations there should be for any N attributes as each attribute must
> be
> > unique. So if we have the right number of unique combinations, and
> they're
> > all subsets of the first-longest, then we must have a complete set.
> > Thoughts on that?
> >
> > Getting _too_ tight with the ordering and contents makes me concerned for
> > the day when the format might change. We don't want to _fail_ an upgrade
> > because some of the combinations were in the wrong order.
>
> That's fair. The planner costing code pulling the stats numbers based
> on the attributes was smart enough to not care much about the ordering
> as far as I recall, but I'd rather make sure of that first. This
> needs some careful lookup.
>

I've done some experiments, creating extended stats objects up to the 8
attribute limit.

The big takeaway is that I wasn't imagining that he number of dependencies
combinations is NOT deterministic:

/*
* if the dependency seems entirely invalid, don't store it
*/
if (degree == 0.0)
continue;

So, in theory, an empty (i.e. '[]') pg_dependencies is valid.

The number of pg_ndistinct is deterministic, now, but I'm even less sure
that'll be true in the future.

We can definitely rely on the attnums being all the positive numbers in
ascending order first, followed by the negative numbers in descending
order, but that's about it. Which raises the question of how we describe
the error when attnums are out of order.

We know that the deserialize functions take the data's word for it as to
how many items to unpack, so I don't see the impact of not caring how many
might be missing. That even sort of feeds into Tom's idea that stats import
was in some sense a fuzzing tool.

> I'd try to look at the bits related to pg_dependencies and
> pg_ndistinct as two separate concepts, at the end. They're sort of
> alike, but have too many differences already.
>

Based on the above, I think we can't really add anything beyond the attnum
order, and we have to relax some existing restrictions on pg_dependencies...

In response to

Re: Extended Statistics set/restore/clear functions. at 2025-11-18 03:34:29 from Michael Paquier

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	John Naylor	2025-11-18 04:48:12	Re: Proposal for enabling auto-vectorization for checksum calculations
Previous Message	Fujii Masao	2025-11-18 04:07:04	Re: [PATCH] Add hints for invalid binary encoding names in encode/decode functions