Re: Add expressions to pg_restore_extended_stats()

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Cc: Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, Tender Wang <tndrwang(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(at)vondra(dot)me>
Subject: Re: Add expressions to pg_restore_extended_stats()
Date: 2026-02-03 02:41:15
Message-ID: aYFgS4xx9lDeGVwu@paquier.xyz
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 02, 2026 at 09:17:03PM -0500, Corey Huinker wrote:
> 1. I don't think we can guarantee that the expression node text is stable
> across major versions, and that would break upgrades, the primary function
> of this code.
> 2. Anyone wanting to modify/hack the exprs values has almost certainly
> extracted it using the jsonb_build_object() code in pg_dump, so they
> already have all expressions before editing.
> 3. Array unnest() has proven to give a stable order in all tests so far.
> 4. We don't decompose mcv into it's parts, so why do that for exprs?

Not including a trace regarding to which expression a row refers to
sounds like a design mistake to me, particularly because JSON is, by
design, JSON, and we don't have ordering requirements. If we don't
include an expression text, I'm OK to give up on this idea. But let's
at least include a negative attribute number with an "attribute"
field. We could cross-check it with the number of expressions defined
in the statext object.

On second though, as we already use negative attribute numbers for
ndistinct and dependencies, perhaps it's not a bad choice to use a
negative number anyway. As the attribute number assigned depends on
the order of the elements in pg_stats_ext.exprs, I'd suggest to tweak
the pg_dump query to rely on that rather than ORDINALITY and the order
where the rows of pg_stats_ext_exprs are scanned. Using the order of
the elements in the definition of the stats object is predictable. A
sequential scan of a catalog offers no real guarantees.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2026-02-03 03:27:49 Re: IO wait events for COPY FROM/TO PROGRAM or file
Previous Message Peter Smith 2026-02-03 02:29:12 Re: use the malloc macros in pg_dump.c