Re: extended stats on partitioned tables

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, David Rowley <dgrowleyml(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject: Re: extended stats on partitioned tables
Date: 2021-09-25 21:46:10
Message-ID: 20210925214610.GW831@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Sep 25, 2021 at 11:01:21PM +0200, Tomas Vondra wrote:
> > Do you think it's possible to backpatch a fix to handle partitioned tables
> > specifically ?
> >
> > The "tuple already updated" error which I reported and which was fixed by
> > 859b3003 involved inheritence children. Since partitioned tables have no data
> > themselves, the !inh check could be relaxed. It's not totally clear to me if
> > the correct statistics would be used in that case. I suppose the wrong
> > (inherited) stats would be wrongly applied affect queries FROM ONLY a
> > partitioned table, which seems pointless to write and also hard for the
> > estimates to be far off :)
>
> Hmmm, maybe. To prevent the "tuple concurrently updated" we must ensure we
> never build stats with and without inheritance at the same time (for the
> same rel). The 859b3003de ensures that by only building extended stats in
> the (!inh) case, but we might tweak that based on relkind. See the attached
> patch. But I wonder if there are cases that might be hurt by this - that'd
> be a regression too, of course.

I think we should leave the inheritance case alone, since it hasn't changed in
2 years, and building stats on the table ONLY is a legitimate interpretation,
and it's as good as we can do without the catalog change.

But the partitioned case used to work, and there's no utility in selecting FROM
ONLY a partitioned table, so we might as well build the stats including its
partitions.

I don't think anything would get worse for the partitioned case.
Obviously building inherited ext stats could change plans - that's the point.
It's weird that the stats objects which existed for 18 months before being
"built" after the patch was applied, but no so weird that the release notes
wouldn't be ample documentation.

If building statistics caused the plan to change undesirably, the solution
would be to drop the stats object, of course.

+ build_ext_stats = (onerel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) ? inh : (!inh);

It's weird to build inherited extended stats for partitioned tables but not for
inheritence parents. We could be clever and say "build inherited ext stats for
inheritence parents only if we didn't insert any stats for the table itself
(because it's empty)". But I think that's fragile: a single tuple in the
parent table could cause stats to be built there instead of on its heirarchy,
and the extended stats would be used for *both* FROM and FROM ONLY, which is an
awful combination.

Since do_analyze_rel is only called once for partitioned tables, I think you
could write that as:

/* Do not build inherited stats (since the catalog cannot support it) except
* for partitioned tables, for which numrows==0 and have no non-inherited stats */
build_ext_stats = !inh || onerel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE;

--
Justin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2021-09-25 22:31:52 Re: extended stats on partitioned tables
Previous Message Tomas Vondra 2021-09-25 21:01:21 Re: extended stats on partitioned tables