Re: extended stats on partitioned tables

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Zhihong Yu <zyu(at)yugabyte(dot)com>, Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: extended stats on partitioned tables
Date: 2021-12-12 21:32:10
Message-ID: 20211212213210.GP17618@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Dec 12, 2021 at 05:17:10AM +0100, Tomas Vondra wrote:
> The one thing bugging me a bit is that the regression test checks only a
> GROUP BY query. It'd be nice to add queries testing MCV/dependencies
> too, but that seems tricky because most queries will use per-partitions
> stats.

You mean because the quals are pushed down to the scan node.

Does that indicate a deficiency ?

If extended stats are collected for a parent table, selectivity estimates based
from the parent would be better; but instead we use uncorrected column
estimates from the child tables.

From what I see, we could come up with a way to avoid the pushdown, involving
volatile functions/foreign tables/RLS/window functions/SRF/wholerow vars/etc.

But would it be better if extended stats objects on partitioned tables were to
collect stats for both parent AND CHILD ? I'm not sure. Maybe that's the
wrong solution, but maybe we should still document that extended stats on
(empty) parent tables are often themselves not used/useful for selectivity
estimates, and the user should instead (or in addition) create stats on child
tables.

Or, maybe if there's no extended stats on the child tables, stats on the parent
table should be consulted ?

--
Justin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2021-12-12 21:49:15 Re: extended stats on partitioned tables
Previous Message Thomas Munro 2021-12-12 21:31:29 Re: Windows now has fdatasync()