Re: pg_restore handles extended statistics inconsistently with statistics data

From: Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: PostgreSQL-development <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Subject: Re: pg_restore handles extended statistics inconsistently with statistics data
Date: 2026-06-15 09:11:11
Message-ID: 37E4590F-EB79-4F00-A4B2-18019B03A7EF@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Jun 15, 2026, at 15:55, Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> On Mon, Jun 15, 2026 at 11:51:40AM +0800, Chao Li wrote:
>> If the fix direction is to make pg_restore behave consistently with
>> pg_dump, then I think Michael's change is correct.
>
> I have been checking for a few hours, and well.. There was more.
>
>> See the attached v2 for details.
>
> That's nice, unfortunately incomplete. I have found a different
> pattern that behaves incorrectly, reusing your script of upthread:
> pg_restore --statistics --table t \
> -f /tmp/archive-s1-stats.sql /tmp/stats.dump
> pg_restore --statistics-only --table t \
> -f /tmp/archive-s1-stats.sql /tmp/stats.dump
>
> With "--statistics --table t", we restore the definition of table "t"
> and its data (correct), miss the stats (incorrect!). With
> "--statistics-only --table t", we restore no definition and no data
> (correct), and still miss the stats (incorrect!). So we still have a
> borked restore of the stats when selecting an individual table. We
> don't care about extended stats with this case, as extstats are
> objects defined at schema level and we cannot filter them, but I think
> that we definitely care about attribute and relation stats here.
>
> This case requires a second change in _tocEntryRequired() even after
> the first fix so as it is possible to select a STATISTICS DATA that
> depends on a table part of the authorized list close to the list that
> includes the "SEQUENCE SET".
>
> With all that in mind, I have the attached. I have expanded the tests
> with the --table case, using dump and restore commands, and that seems
> to work correctly now with the restores, including the cases with
> extstats.
>
> And I'm planning to apply that down to v18 tomorrow, after a second
> round of lookups.
> --
> Michael
> <v3-0001-Fix-pg_restore-with-schema-table-and-statistics-o.patch>

Ah, I missed the --table case. Your finding made me think about the --index case as well. It looks like --index still misses stats with v3 applied. I created two additional test scripts for the table and index cases, see the attached .sh files.

With v3, the table case works:
```
% ./test_pgdump_stats_table.sh
CREATE SCHEMA
SELECT 1000
CREATE INDEX
ANALYZE
archive TOC:
242; 1259 16609 TABLE s1 t chaol
3930; 0 16609 TABLE DATA s1 t chaol
3937; 0 0 STATISTICS DATA s1 t
3938; 0 0 STATISTICS DATA s1 idx_expr
pg_restore --statistics --table t:
CREATE TABLE s1.t (
SELECT * FROM pg_catalog.pg_restore_relation_stats(
SELECT * FROM pg_catalog.pg_restore_attribute_stats(
SELECT * FROM pg_catalog.pg_restore_attribute_stats(
pg_restore --statistics-only --table t:
SELECT * FROM pg_catalog.pg_restore_relation_stats(
SELECT * FROM pg_catalog.pg_restore_attribute_stats(
SELECT * FROM pg_catalog.pg_restore_attribute_stats(
```

But for the index case, stats are still missing:
```
% ./test_pgdump_stats_index.sh
CREATE SCHEMA
SELECT 1000
CREATE INDEX
ANALYZE
archive TOC:
3937; 0 0 STATISTICS DATA s1 t
3764; 1259 16618 INDEX s1 idx_expr chaol
3938; 0 0 STATISTICS DATA s1 idx_expr
pg_restore --statistics --index idx_expr:
CREATE INDEX idx_expr ON s1.t USING btree (((a + 1)));
pg_restore --statistics-only --index idx_expr:
```

We cannot just add a strcmp(te->desc, "STATISTICS DATA") == 0 check to the "else if (strcmp(te->desc, "INDEX") == 0)" branch, because STATISTICS DATA would already have matched the earlier table branch. So in v4, I pulled STATISTICS DATA into its own branch before the table and index branches.

With v4, stats for the index are restored:
```
% ./test_pgdump_stats_index.sh
CREATE SCHEMA
SELECT 1000
CREATE INDEX
ANALYZE
archive TOC:
3937; 0 0 STATISTICS DATA s1 t
3764; 1259 16624 INDEX s1 idx_expr chaol
3938; 0 0 STATISTICS DATA s1 idx_expr
pg_restore --statistics --index idx_expr:
CREATE INDEX idx_expr ON s1.t USING btree (((a + 1)));
SELECT * FROM pg_catalog.pg_restore_relation_stats(
SELECT * FROM pg_catalog.pg_restore_attribute_stats(
pg_restore --statistics-only --index idx_expr:
SELECT * FROM pg_catalog.pg_restore_relation_stats(
SELECT * FROM pg_catalog.pg_restore_attribute_stats(
```

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Attachment Content-Type Size
test_pgdump_stats_index.sh application/octet-stream 997 bytes
test_pgdump_stats_table.sh application/octet-stream 969 bytes
v4-0001-Fix-pg_restore-with-schema-table-and-statistics-o.patch application/octet-stream 5.1 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2026-06-15 09:16:24 Re: Report bytes and transactions actually sent downtream
Previous Message Richard Guo 2026-06-15 09:08:36 Re: Reduce LEFT/FULL JOIN to ANTI JOIN in more cases