Showing applied extended statistics in explain

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Showing applied extended statistics in explain
Date: 2021-03-27 00:50:54
Message-ID: 8081617b-d80f-ae2b-b79f-ea7e926f9fcf@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

With extended statistics it may not be immediately obvious if they were
applied and to which clauses. If you have multiple extended statistics,
we may also apply them in different order, etc. And with expressions,
there's also the question of matching expressions to the statistics.

So it seems useful to include this into in the explain plan - show which
statistics were applied, in which order. Attached is an early PoC patch
doing that in VERBOSE mode. I'll add it to the next CF.

A simple example demonstrating the idea:

======================================================================

create table t (a int, b int);
insert into t select mod(i,10), mod(i,10)
from generate_series(1,100000) s(i);

create statistics s on a, b from t;
analyze t;

test=# explain (verbose) select * from t where a = 1 and b = 1;
QUERY PLAN
---------------------------------------------------------------
Seq Scan on public.t (cost=0.00..1943.00 rows=10040 width=8)
Output: a, b
Filter: ((t.a = 1) AND (t.b = 1))
Statistics: public.s Clauses: ((a = 1) AND (b = 1))
(4 rows)

test=# explain (verbose) select 1 from t group by a, b;
QUERY PLAN
----------------------------------------------------------------------
HashAggregate (cost=1943.00..1943.10 rows=10 width=12)
Output: 1, a, b
Group Key: t.a, t.b
-> Seq Scan on public.t (cost=0.00..1443.00 rows=100000 width=8)
Output: a, b
Statistics: public.s Clauses: (a AND b)
(6 rows)

======================================================================

The current implementation is a bit ugly PoC, with a couple annoying
issues that need to be solved:

1) The information is stashed in multiple lists added to a Plan. Maybe
there's a better place, and maybe we need to invent a better way to
track the info (a new node stashed in a single List).

2) The deparsing is modeled (i.e. copied) from how we deal with index
quals, but it's having issues with nested OR clauses, because there are
nested RestrictInfo nodes and the deparsing does not expect that.

3) It does not work for functional dependencies, because we effectively
"merge" all functional dependencies and apply the entries. Not sure how
to display this, but I think it should show the individual dependencies
actually applied.

4) The info is collected always, but I guess we should do that only when
in explain mode. Not sure how expensive it is.

5) It includes just statistics name + clauses, but maybe we should
include additional info (e.g estimate for that combination of clauses).

6) The clauses in the grouping query are transformed to AND list, which
is wrong. This is easy to fix, I was lazy to do that in a PoC patch.

7) It does not show statistics for individual expressions. I suppose
examine_variable could add it to the rel somehow, and maybe we could do
that with index expressions too?

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
0001-show-stats-in-explain.patch text/x-patch 17.8 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nikita Glukhov 2021-03-27 01:12:56 Re: SQL/JSON: JSON_TABLE
Previous Message alvherre@alvh.no-ip.org 2021-03-27 00:33:20 Re: libpq debug log