Re: Showing applied extended statistics in explain

From: Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Subject: Re: Showing applied extended statistics in explain
Date: 2021-07-23 15:12:25
Message-ID: 2382297.VW1VTv4sZ3@aivenronan
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Le samedi 27 mars 2021, 01:50:54 CEST Tomas Vondra a écrit :
> The current implementation is a bit ugly PoC, with a couple annoying
> issues that need to be solved:
>
Hello Thomas,

I haven't looked at the implementation at all but I think it's an interesting
idea.

> 1) The information is stashed in multiple lists added to a Plan. Maybe
> there's a better place, and maybe we need to invent a better way to
> track the info (a new node stashed in a single List).

Yes this would probably be cleaner.

>
> 2) The deparsing is modeled (i.e. copied) from how we deal with index
> quals, but it's having issues with nested OR clauses, because there are
> nested RestrictInfo nodes and the deparsing does not expect that.
>
> 3) It does not work for functional dependencies, because we effectively
> "merge" all functional dependencies and apply the entries. Not sure how
> to display this, but I think it should show the individual dependencies
> actually applied.

Yes that would be useful when trying to understand where an estimation comes
from.

>
> 4) The info is collected always, but I guess we should do that only when
> in explain mode. Not sure how expensive it is.

That would probably be better yes.

>
> 5) It includes just statistics name + clauses, but maybe we should
> include additional info (e.g estimate for that combination of clauses).

I'm not sure the estimate for the combination is that useful, as you have an
associated estimated number of rows attached to the node. I think to be able
to really make sense of it, a GUC disabling the extended statistics could be
useful for the curious DBA to compare the selectivity estimation for a plan
with the statistics and a plan without.

>
> 6) The clauses in the grouping query are transformed to AND list, which
> is wrong. This is easy to fix, I was lazy to do that in a PoC patch.
>
> 7) It does not show statistics for individual expressions. I suppose
> examine_variable could add it to the rel somehow, and maybe we could do
> that with index expressions too?

Yes this would be useful for single-expression extended statistics as well as
statistics collected from a functional index.

I don't know if it's doable, but if we want to provide insights into how
statistics are used, it could be nice to also display the statistics target
used. Since the values at the time of the last analyze and the current value
might be different, it could be nice to store it along with the stats. I
remember having to troubleshoot queries where the problem was an ALTER <TABLE/
INDEX> ... SET STATISTICS had not been run as expected, and having that
information available in the plan for a complex query might help in diagnosing
the problem quicker.

Regards,

--
Ronan Dunklau

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2021-07-23 15:42:20 Re: badly calculated width of emoji in psql
Previous Message Bruce Momjian 2021-07-23 15:05:51 Re: Have I found an interval arithmetic bug?