Re: extended stats on partitioned tables

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, David Rowley <dgrowleyml(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject: Re: extended stats on partitioned tables
Date: 2021-09-26 11:33:09
Message-ID: 56dceeb0-7d30-fc4d-6a37-8a0f1a76228c@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 9/25/21 11:46 PM, Justin Pryzby wrote:
> On Sat, Sep 25, 2021 at 11:01:21PM +0200, Tomas Vondra wrote:
>>> Do you think it's possible to backpatch a fix to handle partitioned tables
>>> specifically ?
>>>
>>> The "tuple already updated" error which I reported and which was fixed by
>>> 859b3003 involved inheritence children. Since partitioned tables have no data
>>> themselves, the !inh check could be relaxed. It's not totally clear to me if
>>> the correct statistics would be used in that case. I suppose the wrong
>>> (inherited) stats would be wrongly applied affect queries FROM ONLY a
>>> partitioned table, which seems pointless to write and also hard for the
>>> estimates to be far off :)
>>
>> Hmmm, maybe. To prevent the "tuple concurrently updated" we must ensure we
>> never build stats with and without inheritance at the same time (for the
>> same rel). The 859b3003de ensures that by only building extended stats in
>> the (!inh) case, but we might tweak that based on relkind. See the attached
>> patch. But I wonder if there are cases that might be hurt by this - that'd
>> be a regression too, of course.
>
> I think we should leave the inheritance case alone, since it hasn't changed in
> 2 years, and building stats on the table ONLY is a legitimate interpretation,
> and it's as good as we can do without the catalog change.
>
> But the partitioned case used to work, and there's no utility in selecting FROM
> ONLY a partitioned table, so we might as well build the stats including its
> partitions.
>
> I don't think anything would get worse for the partitioned case.
> Obviously building inherited ext stats could change plans - that's the point.
> It's weird that the stats objects which existed for 18 months before being
> "built" after the patch was applied, but no so weird that the release notes
> wouldn't be ample documentation.
>

Agreed.

> If building statistics caused the plan to change undesirably, the solution
> would be to drop the stats object, of course.
>
> + build_ext_stats = (onerel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) ? inh : (!inh);
>
> It's weird to build inherited extended stats for partitioned tables but not for
> inheritence parents. We could be clever and say "build inherited ext stats for
> inheritence parents only if we didn't insert any stats for the table itself
> (because it's empty)". But I think that's fragile: a single tuple in the
> parent table could cause stats to be built there instead of on its heirarchy,
> and the extended stats would be used for *both* FROM and FROM ONLY, which is an
> awful combination.
>

I don't think there's a good way to check if there are any rows in the
parent relation. And even then, a single row might cause huge changes to
query plans (essentially switching to very different stats).

> Since do_analyze_rel is only called once for partitioned tables, I think you
> could write that as:
>
> /* Do not build inherited stats (since the catalog cannot support it) except
> * for partitioned tables, for which numrows==0 and have no non-inherited stats */
> build_ext_stats = !inh || onerel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE;
>

Good point.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Artur Zakirov 2021-09-26 13:54:57 Re: Empty string in lexeme for tsvector
Previous Message Daniel Gustafsson 2021-09-26 10:18:01 Re: can we add some file(msvc) to gitignore