Re: Showing applied extended statistics in explain

From: Tatsuro Yamada <yamatattsu(at)gmail(dot)com>
To: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Showing applied extended statistics in explain
Date: 2022-01-18 11:31:30
Message-ID: CAOKkKFtc45uNFoWYOCo4St19ayxrh-_+4TnZtwxGZz6-3k_GSA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Tomas!

>> What exactly use case do you have in mind?
>Well, my goal was to help users investigating the plan/estimates,
>because once you create multiple "candidate" statistics objects it may
>get quite tricky to determine which of them were applied, in what order,
>etc. It's not all that straightforward, given the various heuristics we
>use to pick the "best" statistics, apply dependencies last, etc. And I
>don't quite want to teach the users those rules, because I consider them
>mostly implementation details that wee may want to tweak in the future.

You are right. This feature will be very useful for plan tuning with
extended statistics. Therefore, I would like to help develop it. :-D

>5) It includes just statistics name + clauses, but maybe we should
>include additional info (e.g estimate for that combination of clauses).

I thought the above sentence was about what level to aim for in the initial
version. Ideally, we would like to include additional information. However,
it is clear that the more things we have, the longer it will take to
develop.
So, I think it is better to commit the initial version at a minimal level
to
provide it to users quickly.

As long as an Extended stats name is displayed in EXPLAIN result, it is
possible to figure out which extended statistics are used. That information
alone is valuable to the user.

> 4) The info is collected always, but I guess we should do that only when
>in explain mode. Not sure how expensive it is.

In the case of pg_plan_advsr that I created, I used ProcessUtility_hook to
detect EXPLAIN command. It searches all queries to find T_ExplainStmt, and
set a flag. I guess we can't use this method in Postgres core, right?

If we have to collect the extra info for all query executions, we need to
reduce overhead. I mean collecting the minimum necessary.
To do that, I think it would be better to display less Extended stats info
in EXPLAIN results. For example, displaying only extended stats names is
fine,
I guess. (I haven't understood your patch yet, so If I say wrong, sorry)

>6) The clauses in the grouping query are transformed to AND list, which
>is wrong. This is easy to fix, I was lazy to do that in a PoC patch.

6) is related 5).
If we agree to remove showing quals of extended stats in EXPLAIN result,
We can solve this problem by removing the code. Is it okay?

>2) The deparsing is modeled (i.e. copied) from how we deal with index
>quals, but it's having issues with nested OR clauses, because there are
>nested RestrictInfo nodes and the deparsing does not expect that.
>
>3) It does not work for functional dependencies, because we effectively
>"merge" all functional dependencies and apply the entries. Not sure how
>to display this, but I think it should show the individual dependencies
>actually applied.
>
>7) It does not show statistics for individual expressions. I suppose
>examine_variable could add it to the rel somehow, and maybe we could do
>that with index expressions too?

I'm not sure about 2) 3) and 7) above, so I'd like to see some concrete
examples
of queries. I would like to include it in the test pattern for regression
testing.

To be fixed:

* StatisticExtInfo should have a _copy etc. node functionality now,
right? I've hit "unrecognized node type" with a prepared statement
because it's missing.

* Is there any particular reason behind choosing only some scan nodes to
display extended stats? E.g. BitmapHeapScan is missing.

* (New) In the case of Group by, we should put Extended stats info under the
Hash/Group Aggregate node (not under Scan node).

* (New) We have to create a regression test including the above test
patterns.

Attached patch:

It is a rebased version on the head of the master because there were many
Hunks
when I applied the previous patch on master.

I'll create regression tests firstly.

Regards,
Tatsuro Yamada

Attachment Content-Type Size
0001-show-stats-in-explain-rebased.patch application/octet-stream 16.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2022-01-18 12:22:55 Re: a misbehavior of partition row movement (?)
Previous Message Takashi Menjo 2022-01-18 10:58:35 Re: Map WAL segment files on PMEM as WAL buffers