Re: Add starelid, attnum to pg_stats and leverage this in pg_dump

From: Sami Imseih <samimseih(at)gmail(dot)com>
To: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Cc: nathandbossart(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Add starelid, attnum to pg_stats and leverage this in pg_dump
Date: 2026-02-26 20:16:36
Message-ID: CAA5RZ0vY5jHXQEOyUdjW7tPrXb9TY_bdr8ZpCRuALj1zU5DD_w@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

> Per side conversation in [1], this patch exposes pg_statistic.starelid in the pg_stats view (0001)
> and then changes pg_dump to use the starelid in the queries on pg_stats rather than the
> combination of schemaname+relname, which gets a bit clumsy in bulk array fetching, and
> also leads to bad query plans against pg_stats because the security barrier is also an
> optimization barrier, which means that queries often try to hash join, which causes the
> very large table pg_statistic to be sequentially scanned, and that's a bad time. Currently
> we get around this by adding a redundant qual to the query which gooses the plan towards
> an index, and that works fine for now, but there's no guarantee that it will continue to work
> in the future, so this change brings some plan safety as well.

This alone seems like a good enough reason to include startled in
pg_stat, besides other
future use-cases this will simplify as discussed in [1]

> 0001 also exposes pg_statistic.attnum. This is no direct application of this in 0002,
> but people have often expressed surprise that pg_dump orders the
> pg_restore_attribute_stats() calls by attname rather than attnum,
> and if we wanted to change that now we could (albeit only on new versions).

This seems logical

v1-0001 comments:

1/

- nspname AS schemaname,
- relname AS tablename,
- attname AS attname,
+ n.nspname AS schemaname,
+ c.relname AS tablename,
+ s.starelid AS starelid,
+ a.attnum AS attnum,
+ a.attname AS attname,

Ideally, we would want the OID and attnum to be first columns to match
the pg_stat_activity/pg_stat_progress style, but that will be too invasive.

The new attname column should be moved after tablename, however.

2/

+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>attnum</structfield> <type>name</type>
+ (references <link
linkend="catalog-pg-attribute"><structname>pg_attribute</structname></link>.<structfield>attnum</structfield>)
+ </para>
+ <para>
+ Position of column described by this row
+ </para></entry>
+ </row>

- This <type> should be int2
- Maybe for the describtion, it should be something like:

"The number of the column, as in
<structname>pg_attribute</structname>.<structfield>attnum</structfield>"

to be close to the description in pg_attribute

3/

+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>starelid</structfield> <type>oid</type>
+ </para>
+ <para>
+ ID of the relation
+ </para></entry>
+ </row>

This should indicate it is a reference to pg_class.oid, like so:

```
(references <link
linkend="catalog-pg-class"><structname>pg_class</structname></link>.<structfield>oid</structfield>)
```

Maybe "ID of the table or index" is better, since this can only be a
table or index
for pg_stats.

I dislike the existing "pg_stats.tablename", since this can also be an
expression index.
"pg_stats.relation" with a description of "Name of table or index" is
more appropriate.
It is a change that we can possibly make in a major version. Looked
through the archives,
and did not see this being reported/discussed.

v1-0002:

I examined the version variants of getAttributeStats and all looks good,
and also ran a test on a 18 and 19 server version with the patched pg_dump
client and all looks good.

One minor comment is:

+ pg_fatal("statistics table oid information missing");

I noticed in other pg_fatal messages, we include OIDs

pg_fatal("could not find function definition for function
with OID %u",
cast->castfunc);

Should we do the same here?

[1] https://www.postgresql.org/message-id/aZ3RbTE8Racscykc@nathan

--
Sami Imseih
Amazon Web Services (AWS)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Matheus Alcantara 2026-02-26 20:29:34 Re: Show expression of virtual columns in error messages
Previous Message Mark Wong 2026-02-26 19:43:30 Re: updates for handling optional argument in system functions