Re: extended stats on partitioned tables

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>, pgsql-hackers(at)postgresql(dot)org
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject: Re: extended stats on partitioned tables
Date: 2021-09-25 19:27:10
Message-ID: e2196aa4-4a52-6168-6f13-f50b4d0edf48@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 9/23/21 11:26 PM, Justin Pryzby wrote:
> extended stats objects are allowed on partitioned tables since v10.
> https://www.postgresql.org/message-id/flat/CAKJS1f-BmGo410bh5RSPZUvOO0LhmHL2NYmdrC_Jm8pk_FfyCA%40mail.gmail.com
> 8c5cdb7f4f6e1d6a6104cb58ce4f23453891651b
>
> But since 859b3003de they're not populated - pg_statistic_ext(_data) is empty.
> This was the consequence of a commit to avoid an error I reported with stats on
> inheritence parents (not partitioned tables).
>
> preceding 859b3003de, stats on the parent table *did* improve the estimate,
> so this part of the commit message seems to have been wrong?
> |commit 859b3003de87645b62ee07ef245d6c1f1cd0cedb
> | Don't build extended statistics on inheritance trees
> ...
> | Moreover, the current selectivity estimation code only works with individual
> | relations, so building statistics on inheritance trees would be pointless
> | anyway.
>
> |CREATE TABLE p (i int, a int, b int) PARTITION BY RANGE (i);
> |CREATE TABLE pd PARTITION OF p FOR VALUES FROM (1)TO(100);
> |TRUNCATE p; INSERT INTO p SELECT 1, a/100, a/100 FROM generate_series(1,999)a;
> |CREATE STATISTICS pp ON (a),(b) FROM p;
> |VACUUM ANALYZE p;
> |SELECT * FROM pg_statistic_ext WHERE stxrelid ='p'::regclass;
>
> |postgres=# begin; DROP STATISTICS pp; explain analyze SELECT a,b FROM p GROUP BY 1,2; abort;
> | HashAggregate (cost=20.98..21.98 rows=100 width=8) (actual time=1.088..1.093 rows=10 loops=1)
>
> |postgres=# explain analyze SELECT a,b FROM p GROUP BY 1,2;
> | HashAggregate (cost=20.98..21.09 rows=10 width=8) (actual time=1.082..1.086 rows=10 loops=1)
>
> So I think this is a regression, and extended stats should be populated for
> partitioned tables - I had actually done that for some parent tables and hadn't
> noticed that the stats objects no longer do anything.
>
> That begs the question if the current behavior for inheritence parents is
> correct..
>
> CREATE TABLE p (i int, a int, b int);
> CREATE TABLE pd () INHERITS (p);
> INSERT INTO pd SELECT 1, a/100, a/100 FROM generate_series(1,999)a;
> CREATE STATISTICS pp ON (a),(b) FROM p;
> VACUUM ANALYZE p;
> explain analyze SELECT a,b FROM p GROUP BY 1,2;
>
> | HashAggregate (cost=25.99..26.99 rows=100 width=8) (actual time=3.268..3.284 rows=10 loops=1)
>

Agreed, that seems like a regression, but I don't see how to fix that
without having the extra flag in the catalog. Otherwise we can store
just one version for each statistics object :-(

> Since child tables can be queried directly, it's a legitimate question whether
> we should collect stats for the table heirarchy or (since the catalog only
> supports one) only the table itself. I'd think that stats for the table
> hierarchy would be more commonly useful (but we shouldn't change the behavior
> in existing releases again). Anyway it seems unfortunate that
> statistic_ext_data still has no stxinherited.
>

Yeah, we probably need the flag - I planned to get it into 14, but then
I got distracted by something else :-/

Attached is a PoC that I quickly bashed together today. It's pretty raw,
but it passed "make check" and I think it does most of the things right.
Can you try if this fixes the estimates with partitioned tables?

Extended statistics use two catalogs, pg_statistic_ext for definition,
while pg_statistic_ext_data stores the built statistics objects - the
flag needs to be in the "data" catalog, and managing the records is a
bit challenging - the current PoC code mostly works, but I had to relax
some error checks and I'm sure there are cases when we fail to remove a
row, or something like that.

> Note that for partitioned tables if I enable enable_partitionwise_aggregate,
> then stats objects on the child tables can be helpful (but that's also
> confusing to the question at hand).
>

Yeah. I think it'd be helpful to assemble a script with various test
cases demonstrating how we estimate various cases.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
statistics-inheritance-fix-v1.patch text/x-patch 27.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hannu Krosing 2021-09-25 19:31:55 Re: logical replication restrictions
Previous Message Daniel Gustafsson 2021-09-25 18:34:44 Re: OpenSSL 3.0.0 compatibility