Re: [PATCH] Add extra statistics to explain for Nested Loop

From: Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp>
To: Julien Rouhaud <rjuju123(at)gmail(dot)com>
Cc: e(dot)sokolova(at)postgrespro(dot)ru, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Add extra statistics to explain for Nested Loop
Date: 2021-02-01 13:13:15
Message-ID: 20210201221315.06393d58e8205bc18bd0b84b@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 1 Feb 2021 13:28:45 +0800
Julien Rouhaud <rjuju123(at)gmail(dot)com> wrote:

> On Thu, Jan 28, 2021 at 8:38 PM Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp> wrote:
> >
> > postgres=# explain (analyze, verbose) select * from a,b where a.i=b.j;
> > QUERY PLAN
> > ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> > Nested Loop (cost=0.00..2752.00 rows=991 width=8) (actual time=0.021..17.651 rows=991 loops=1)
> > Output: a.i, b.j
> > Join Filter: (a.i = b.j)
> > Rows Removed by Join Filter: 99009
> > -> Seq Scan on public.b (cost=0.00..2.00 rows=100 width=4) (actual time=0.009..0.023 rows=100 loops=1)
> > Output: b.j
> > -> Seq Scan on public.a (cost=0.00..15.00 rows=1000 width=4) (actual time=0.005..0.091 min_time=0.065 max_time=0.163 min_rows=1000 rows=1000 max_rows=1000 loops=100)
> > Output: a.i
> > Planning Time: 0.066 ms
> > Execution Time: 17.719 ms
> > (10 rows)
> >
> > I don't like this format where the extra statistics appear in the same
> > line of existing information because the output format differs depended
> > on whether the plan node's loops > 1 or not. This makes the length of a
> > line too long. Also, other information reported by VERBOSE doesn't change
> > the exiting row format and just add extra rows for new information.
> >
> > Instead, it seems good for me to add extra rows for the new statistics
> > without changint the existing row format as other VERBOSE information,
> > like below.
> >
> > -> Seq Scan on public.a (cost=0.00..15.00 rows=1000 width=4) (actual time=0.005..0.091 rows=1000 loops=100)
> > Output: a.i
> > Min Time: 0.065 ms
> > Max Time: 0.163 ms
> > Min Rows: 1000
> > Max Rows: 1000
> >
> > or, like Buffers,
> >
> > -> Seq Scan on public.a (cost=0.00..15.00 rows=1000 width=4) (actual time=0.005..0.091 rows=1000 loops=100)
> > Output: a.i
> > Loops: min_time=0.065 max_time=0.163 min_rows=1000 max_rows=1000
> >
> > and so on. What do you think about it?
>
> It's true that the current output is a bit long, which isn't really
> convenient to read. Using one of those alternative format would also
> have the advantage of not breaking compatibility with tools that
> process those entries. I personally prefer the 2nd option with the
> extra "Loops:" line . For non text format, should we keep the current
> format?

For non text format, I think "Max/Min Rows", "Max/Min Times" are a bit
simple and the meaning is unclear. Instead, similar to a style of "Buffers",
does it make sense using "Max/Min Rows in Loops" and "Max/Min Times in Loops"?

Regards,
Yugo Nagata

--
Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2021-02-01 13:18:08 Re: pgbench stopped supporting large number of client connections on Windows
Previous Message Alexander Korotkov 2021-02-01 13:00:21 Re: pgsql: Implementation of subscripting for jsonb