Re: PoC/WIP: Extended statistics on expressions

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PoC/WIP: Extended statistics on expressions
Date: 2020-12-11 20:17:40
Message-ID: 958870c8-65e0-31b1-4591-b0b10e807dd9@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/11/20 1:58 PM, Dean Rasheed wrote:
> On Tue, 8 Dec 2020 at 12:44, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>>
>> Possibly. But I don't think it's worth the extra complexity. I don't
>> expect people to have a lot of overlapping stats, so the amount of
>> wasted space and CPU time is expected to be fairly limited.
>>
>> So I don't think it's worth spending too much time on this now. Let's
>> just do what you proposed, and revisit this later if needed.
>>
>
> Yes, I think that's a reasonable approach to take. As long as the
> documentation makes it clear that building MCV stats also causes
> standard expression stats to be built on any expressions included in
> the list, then the user will know and can avoid duplication most of
> the time. I don't think there's any need for code to try to prevent
> that -- just as we don't bother with code to prevent a user building
> multiple indexes on the same column.
>
> The only case where duplication won't be avoidable is where there are
> multiple MCV stats sharing the same expression, but that's probably
> quite unlikely in practice, and it seems acceptable to leave improving
> that as a possible future optimisation.
>

OK. Attached is an updated version, reworking it this way.

I tried tweaking the grammar to differentiate these two syntax variants,
but that led to shift/reduce conflicts with the existing ones. I tried
fixing that, but I ended up doing that in CreateStatistics().

The other thing is that we probably can't tie this to just MCV, because
functional dependencies need the per-expression stats too. So I simply
build expression stats whenever there's at least one expression.

I also decided to keep the "expressions" statistics kind - it's not
allowed to specify it in CREATE STATISTICS, but it's useful internally
as it allows deciding whether to build the stats in a single place.
Otherwise we'd need to do that every time we build the statistics, etc.

I added a brief explanation to the sgml docs, not sure if that's good
enough - maybe it needs more details.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
0001-bootstrap-convert-Typ-to-a-List-20201211.patch text/x-patch 3.7 KB
0002-Allow-composite-types-in-bootstrap-20201211.patch text/x-patch 1.4 KB
0003-Extended-statistics-on-expressions-20201211.patch text/x-patch 234.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2020-12-11 20:27:03 Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly
Previous Message Fabien COELHO 2020-12-11 19:45:12 Re: PG vs LLVM 12 on seawasp, next round