Re: [PATCH] Add extra statistics to explain for Nested Loop

From: e(dot)sokolova(at)postgrespro(dot)ru
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [PATCH] Add extra statistics to explain for Nested Loop
Date: 2021-03-25 09:52:42
Message-ID: 32513912fb3f4b4042a3834b2e5afd5c@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thank you all for your feedback and reforms.
I attach a new version of the patch with the some changes and fixes.
Here's a list of the major changes:
1) New format of extra statistics. This is now contained in a line
separate from the main statistics.

Julien Rouhaud писал 2021-02-01 08:28:
> On Thu, Jan 28, 2021 at 8:38 PM Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp>
> wrote:
>>
>> postgres=# explain (analyze, verbose) select * from a,b where a.i=b.j;
>>
>> QUERY PLAN
>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>> Nested Loop (cost=0.00..2752.00 rows=991 width=8) (actual
>> time=0.021..17.651 rows=991 loops=1)
>> Output: a.i, b.j
>> Join Filter: (a.i = b.j)
>> Rows Removed by Join Filter: 99009
>> -> Seq Scan on public.b (cost=0.00..2.00 rows=100 width=4)
>> (actual time=0.009..0.023 rows=100 loops=1)
>> Output: b.j
>> -> Seq Scan on public.a (cost=0.00..15.00 rows=1000 width=4)
>> (actual time=0.005..0.091 min_time=0.065 max_time=0.163 min_rows=1000
>> rows=1000 max_rows=1000 loops=100)
>> Output: a.i
>> Planning Time: 0.066 ms
>> Execution Time: 17.719 ms
>> (10 rows)
>>
>> I don't like this format where the extra statistics appear in the same
>> line of existing information because the output format differs
>> depended
>> on whether the plan node's loops > 1 or not. This makes the length of
>> a
>> line too long. Also, other information reported by VERBOSE doesn't
>> change
>> the exiting row format and just add extra rows for new information.
>>
>> Instead, it seems good for me to add extra rows for the new statistics
>> without changint the existing row format as other VERBOSE information,
>> like below.
>>
>> -> Seq Scan on public.a (cost=0.00..15.00 rows=1000 width=4)
>> (actual time=0.005..0.091 rows=1000 loops=100)
>> Output: a.i
>> Loops: min_time=0.065 max_time=0.163 min_rows=1000
>> max_rows=1000
>>
>> and so on. What do you think about it?
>

2) Correction of the case of parallel scan

>> In parallel scan, the extra statistics are not reported correctly.
>>
>> This reports max/min rows or time of inner scan as 0 in parallel
>> workers,
>> and as a result only the leader process's ones are accounted. To fix
>> this,
>> we would change InstrAggNode as below.
>>
>

3) Adding extra statistics about total number of rows (total rows).
There were many wishes for this here.

Please don't hesitate to share any thoughts on this topic.

--
Ekaterina Sokolova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
extra_statistics_v3.patch text/x-diff 17.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2021-03-25 10:01:23 Re: wal stats questions
Previous Message Joel Jacobson 2021-03-25 09:48:55 Re: [PATCH] pg_permissions