| From: | Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> |
|---|---|
| To: | Michael Paquier <michael(at)paquier(dot)xyz> |
| Cc: | Corey Huinker <corey(dot)huinker(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com> |
| Subject: | Re: pg_restore handles extended statistics inconsistently with statistics data |
| Date: | 2026-06-16 06:36:21 |
| Message-ID: | 62F7A896-B40E-4AD4-A08D-497FA8E01C67@gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
> On Jun 16, 2026, at 11:23, Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> On Tue, Jun 16, 2026 at 11:07:22AM +0900, Michael Paquier wrote:
>> Sharing the check for a STATISTICS DATA TOC entry on table and index
>> names was making me ticking a bit, as this is not entirely
>> collision-proof for the names, but it also looks like we do things the
>> same way with TABLE DATA and INDEX, so.. At the end, applied down to
>> v18 as suggested.
>
> And I am having second thoughts on this one. Take for example this
> case:
> CREATE SCHEMA s1;
> CREATE SCHEMA s2;
> CREATE TABLE s1.foo (id int);
> INSERT INTO s1.foo SELECT generate_series(1,100);
> ANALYZE s1.foo;
> CREATE TABLE s2.bar (id int);
> CREATE INDEX foo ON s2.bar(id);
> INSERT INTO s2.bar SELECT generate_series(1,100);
> ANALYZE s2.bar;
>
> And then this:
> pg_dump --statistics -Fc -f stats.dump mydb
> pg_restore --statistics-only --index=foo -f stats_foo.sql stats.dump
>
> On HEAD, we get relation and attribute we should not in stats_foo.sql,
> getting also some data from the table s1.foo. With the patch
> attached, that strengthens the name check based on the type of the
> depending TOC entries, we only get the relation stats of s2.foo,
> nothing about the table s1.foo. This feels too funky to write a test
> for, wasting cycles compared to the existing coverage.
>
> pg_restore --index is as old as e8f69be054e9, so it's not like we
> could just remove it, but I'd say that with the schema-level restore
> this would be tempting.
>
> Anyway, let's improve this situation with the attached, for HEAD and
> v18.
> --
> Michael
> <0001-Use-dependency-based-matching-for-STATISTICS-DATA-in.patch>
I think the dependency-based matching is better, but what happens if the archive is created with --statistics-only? In that case, the statistics entries still have dependencies, but the referenced parent TOC entries are not present in the archive. See the attached test script. It creates the archive with --statistics-only, with the new patch, neither table nor index stats are restored:
```
% ./test_pgdump_stats_new.sh
CREATE SCHEMA
SELECT 1000
CREATE INDEX
ANALYZE
archive TOC:
3936; 0 0 STATISTICS DATA s1 t
3937; 0 0 STATISTICS DATA s1 idx_expr
pg_restore --statistics-only --table t:
pg_restore --statistics-only --index idx_expr:
```
With HEAD, both table and index stats are restored:
```
% ./test_pgdump_stats_new.sh
CREATE SCHEMA
SELECT 1000
CREATE INDEX
ANALYZE
archive TOC:
3936; 0 0 STATISTICS DATA s1 t
3937; 0 0 STATISTICS DATA s1 idx_expr
pg_restore --statistics-only --table t:
SELECT * FROM pg_catalog.pg_restore_relation_stats(
SELECT * FROM pg_catalog.pg_restore_attribute_stats(
SELECT * FROM pg_catalog.pg_restore_attribute_stats(
pg_restore --statistics-only --index idx_expr:
SELECT * FROM pg_catalog.pg_restore_relation_stats(
SELECT * FROM pg_catalog.pg_restore_attribute_stats(
```
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
| Attachment | Content-Type | Size |
|---|---|---|
| test_pgdump_stats_new.sh | application/octet-stream | 908 bytes |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Shlok Kyal | 2026-06-16 06:39:47 | Re: synchronized_standby_slots behavior inconsistent with quorum-based synchronous replication |
| Previous Message | Peter Smith | 2026-06-16 06:36:10 | Re: Support EXCEPT for TABLES IN SCHEMA publications |