extended stats on partitioned tables

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject: extended stats on partitioned tables
Date: 2021-09-23 21:26:24
Message-ID: 20210923212624.GI831@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

extended stats objects are allowed on partitioned tables since v10.
https://www.postgresql.org/message-id/flat/CAKJS1f-BmGo410bh5RSPZUvOO0LhmHL2NYmdrC_Jm8pk_FfyCA%40mail.gmail.com
8c5cdb7f4f6e1d6a6104cb58ce4f23453891651b

But since 859b3003de they're not populated - pg_statistic_ext(_data) is empty.
This was the consequence of a commit to avoid an error I reported with stats on
inheritence parents (not partitioned tables).

preceding 859b3003de, stats on the parent table *did* improve the estimate,
so this part of the commit message seems to have been wrong?
|commit 859b3003de87645b62ee07ef245d6c1f1cd0cedb
| Don't build extended statistics on inheritance trees
...
| Moreover, the current selectivity estimation code only works with individual
| relations, so building statistics on inheritance trees would be pointless
| anyway.

|CREATE TABLE p (i int, a int, b int) PARTITION BY RANGE (i);
|CREATE TABLE pd PARTITION OF p FOR VALUES FROM (1)TO(100);
|TRUNCATE p; INSERT INTO p SELECT 1, a/100, a/100 FROM generate_series(1,999)a;
|CREATE STATISTICS pp ON (a),(b) FROM p;
|VACUUM ANALYZE p;
|SELECT * FROM pg_statistic_ext WHERE stxrelid ='p'::regclass;

|postgres=# begin; DROP STATISTICS pp; explain analyze SELECT a,b FROM p GROUP BY 1,2; abort;
| HashAggregate (cost=20.98..21.98 rows=100 width=8) (actual time=1.088..1.093 rows=10 loops=1)

|postgres=# explain analyze SELECT a,b FROM p GROUP BY 1,2;
| HashAggregate (cost=20.98..21.09 rows=10 width=8) (actual time=1.082..1.086 rows=10 loops=1)

So I think this is a regression, and extended stats should be populated for
partitioned tables - I had actually done that for some parent tables and hadn't
noticed that the stats objects no longer do anything.

That begs the question if the current behavior for inheritence parents is
correct..

CREATE TABLE p (i int, a int, b int);
CREATE TABLE pd () INHERITS (p);
INSERT INTO pd SELECT 1, a/100, a/100 FROM generate_series(1,999)a;
CREATE STATISTICS pp ON (a),(b) FROM p;
VACUUM ANALYZE p;
explain analyze SELECT a,b FROM p GROUP BY 1,2;

| HashAggregate (cost=25.99..26.99 rows=100 width=8) (actual time=3.268..3.284 rows=10 loops=1)

Since child tables can be queried directly, it's a legitimate question whether
we should collect stats for the table heirarchy or (since the catalog only
supports one) only the table itself. I'd think that stats for the table
hierarchy would be more commonly useful (but we shouldn't change the behavior
in existing releases again). Anyway it seems unfortunate that
statistic_ext_data still has no stxinherited.

Note that for partitioned tables if I enable enable_partitionwise_aggregate,
then stats objects on the child tables can be helpful (but that's also
confusing to the question at hand).

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2021-09-23 21:26:46 Re: OpenSSL 3.0.0 compatibility
Previous Message Peter Eisentraut 2021-09-23 21:20:19 Re: Proposal: Save user's original authenticated identity for logging