Re: PoC/WIP: Extended statistics on expressions

From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PoC/WIP: Extended statistics on expressions
Date: 2021-03-25 11:35:21
Message-ID: CAEZATCWQv1E1X7Tygn8idcDQuLU-t9wHfSu3pARtOn5cZUK8cw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 25 Mar 2021 at 00:05, Tomas Vondra
<tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>
> Actually, I think we need that block at all - there's no point in
> keeping the exact expression, because if there was a statistics matching
> it it'd be matched by the examine_variable. So if we get here, we have
> to just split it into the vars anyway. So the second block is entirely
> useless.

Good point.

> That however means we don't need the processing with GroupExprInfo and
> GroupVarInfo lists, i.e. we can revert back to the original simpler
> processing, with a bit of extra logic to match expressions, that's all.
>
> The patch 0003 does this (it's a bit crude, but hopefully enough to
> demonstrate).

Cool. I did wonder about that, but I didn't fully think it through.
I'll take a look.

> 0002 is an attempt to fix an issue I noticed today - we need to handle
> type changes.
>
> I think we have two options:
>
> a) Make UpdateStatisticsForTypeChange smarter to also transform and
> update the expression string, and reset pg_statistics[] data.
>
> b) Just recreate the statistics, just like we do for indexes. Currently
> this does not force analyze, so it just resets all the stats. Maybe it
> should do analyze, though.

I'd vote for (b) without an analyse, and I agree with getting rid of
UpdateStatisticsForTypeChange(). I've always been a bit skeptical
about trying to preserve extended statistics after a type change, when
we don't preserve regular per-column stats.

> BTW I wonder how useful the updated statistics actually is. Consider
> this example:
> ...
> the expression now looks like this:
>
> ========================================================================
> "public"."s" (ndistinct) ON ((a + b)), ((((b)::numeric)::double
> precision + c)) FROM t
> ========================================================================
>
> But we're matching it to (((b)::double precision + c)), so that fails.
>
> This is not specific to extended statistics - indexes have exactly the
> same issue. Not sure how common this is in practice.

Hmm, that's unfortunate. Maybe it's not that common in practice
though. I'm not sure if there is any practical way to fix it, but if
there is, I guess we'd want to apply the same fix to both stats and
indexes, and that certainly seems out of scope for this patch.

Regards,
Dean

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-03-25 11:42:31 Re: Replication slot stats misgivings
Previous Message Kyotaro Horiguchi 2021-03-25 11:26:10 Re: shared-memory based stats collector