Re: [PATCH] New command to monitor progression of long running queries

From: Remi Colinet <remi(dot)colinet(at)gmail(dot)com>
To: Maksim Milyutin <m(dot)milyutin(at)postgrespro(dot)ru>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] New command to monitor progression of long running queries
Date: 2017-04-19 14:13:56
Message-ID: CADdR5nxvTuK34ShnDYcLpnkS06nofrPZVr4sW2RcSdDCKWFg0Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Maksim,

2017-04-18 20:31 GMT+02:00 Maksim Milyutin <m(dot)milyutin(at)postgrespro(dot)ru>:

> On 18.04.2017 17:39, Remi Colinet wrote:
>
>> Hello Maksim,
>>
>> The core implementation I suggested for the new PROGRESS command uses
>> different functions from the one used by EXPLAIN for its core
>> implementation.
>> Some source code is shared with EXPLAIN command. But this shared code is
>> only related to quals, properties, children, subPlans and few other nodes.
>>
>> All other code for PROGRESS is new code.
>>
>> I don't believe explain.c code can be fully shared with the one of the
>> new PROGRESS command. These 2 commands have different purposes.
>> The core code used for the new PROGRESS command is very different from
>> the core code used for EXPLAIN.
>>
>>
> Perhaps you will be forced to duplicate significant snippets of
> functionality from explain.c into your progress.c.
>

Currently, few code is duplicated between EXPLAIN and PROGRESS commands.
The duplicated code could be moved to file src/backend/commands/report.c
which is used to gather shared code between the 2 commands. I will try to
complete this code sharing as much as possible.

The main point is that PROGRESS uses the same design pattern as EXPLAIN by
parsing the query tree. The work horse of the PROGRESS command is
ProgressNode() which calls recursively sub nodes until we reach leaf nodes
such as SeqScan, IndexScan, TupleStore, Sort, Material, ... . EXPLAIN
command uses a similar work horse with function ExplainNode() which
eventually calls different leaf nodes.

Some of the leaf nodes which are common to the 2 commands have been put in
the file src/backend/commands/report.c. May be some further code sharing is
also possible for the work horse by using a template function which would
call EXPLAIN specific leaf node functions or PROGRESS specific leaf node
functions.

>
>
>> Regarding the queryDesc state of SQL query upon receiving a request to
>> report its execution progress, it does not bring any issue. The request
>> is noted when the signal is received by the monitored backend. Then, the
>> backend continues its execution code path. When interrupts are checked
>> in the executor code, the request will be dealt.
>>
>>
> Yes, interrupts are checked in the CHECK_FOR_INTERRUPTS entries.
>
> When the request is being dealt, the monitored backend will stop its
>> execution and report the progress of the SQL query. Whatever is the
>> status of the SQL query, progress.c code checks the status and report
>> either that the SQL query does not have a valid status, or otherwise the
>> current execution state of the SQL query.
>>
>> SQL query status checking is about:
>> - idle transaction
>> - out of transaction status
>> - null planned statement
>> - utility command
>> - self monitoring
>>
>> Other tests can be added if needed to exclude some SQL query state. Such
>> checking is done in void HandleProgressRequest(void).
>> I do not see why a SQL query progression would not be possible in this
>> context. Even when the queryDescc is NULL, we can just report a <idle
>> transaction> output. This is currently the case with the patch suggested.
>>
>>
> It's interesting question - how much the active query is in a usable state
> on the stage of execution. Tom Lane noticed that CHECK_FOR_INTERRUPTS
> doesn't give us 100% guarantee about full consistency [1].
>

I wonder what you mean about usable state.

Currently, the code suggested tests the queryDesc pointer and all the sub
nodes pointers in order to detect NULL pointers. When the progress report
is collected by the backend, this backend does the collect and consequently
does not run the query. So the query tree is not being modified. At this
moment, whatever is the query state, we can manage to deal with its static
state. It is only a tree which could also be just a NULL pointer in the
most extreme case. Such case is dealt in the current code.

>
> So far, I've found this new command very handy. It allows to evaluate
>> the time needed to complete a SQL query.
>>
>>
> Could you explain how you get the percent of execution for nodes of plan
> tree and overall for query?
>

The progress of execution of the query is computed as follows at 2
different places for each leaf node type (Scan, IndexScan, Sort, Material,
TupleStore, ...):

- one place in the executor code, or in access methods code, or in sort
utilities code, used during the execution of the SQL query in which
following values are counted for instance: rows R/W, blocks, R/W, tapes R/W
used for sort, tuple store R/W, ... . Some of these values are already
computed in the current Postgresql official source code. Some other values
had to be added and collected.

- one place in the leaf function of each node type (ProgressScan(),
ProgressSort(), ...) in which percents are computed and are then dumped
together with raw values collected during execution, in the report. The
dump details can be selected with the VERBOSE option of the PROGRESS
command (For instance # PROGRESS VERBOSE $pid)

For instance:

1/ read/write rows are collected when running the executor in the file
src/backend/executor/execProcnode.c
==============================================================================

This is already implemented in the current official source tree. Nothing
mode needs to be implemented to collect values (total rows number and
current rows number already fetched are collected).
The report is implemented in leaf function ProgressScanRows().

2/ read/write blocks are collected in the file
src/backend/access/heap/heapam.c
==========================================================

This is already implemented in the current official source tree. Nothing
more needs to be implemented to collect values during execution.
The report is implemented in a leaf function ProgressScanBlks().

3/ the sort progressions are collected in the file
src/backend/utils/sort/tuplesort.c
==========================================================

[root(at)rco pg]# git diff --stat master.. src/backend/utils/sort/tuplesort.c
src/backend/utils/sort/tuplesort.c | 142
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-------------
1 file changed, 127 insertions(+), 15 deletions(-)
[root(at)rco pg]#

New fields have been added to compute the different I/O
(read/write/merge/...) per tapes for instance, during a sort on tape.
The report of Sort node is implemented in leaf function ProgressSort()

4/ the tuple store progressions are computed in the file
src/backend/utils/sort/tuplestore.c
=================================================================

[root(at)rco pg]# git diff --stat master.. src/backend/utils/sort/tuplestore.c
src/backend/utils/sort/tuplestore.c | 73
++++++++++++++++++++++++++++++++++++++++++++++++++++++++-----------------
1 file changed, 56 insertions(+), 17 deletions(-)
[root(at)rco pg]#

New fields have been added to collect the I/O needed for such tuple store.
The report of TupleStore node is implemented in leaf function
ProgressTupleStore().

Other node types have been implemented: TidScan, IndexScan, LimitScan,
CustomScan, Hash, ModifyTable.
Such node may require some new fields to collect values during the SQL
query execution.

Overall, the overhead caused by new values collected during the SQL query
execution, is very low.
A few values need to be collected.

> A further improvement would be to report the memory, disk and time
>> resources used by the monitored backend. An overuse of memory, disk and
>> time resources can prevent the SQL query to complete.
>>
>>
> This functionality is somehow implemented in explain.c. You can see my
> patch to this file [2]. I only manipulate runtime statistics (data in the
> structure 'Instrumentation') to achieve the printing state of running query.
>
> I will check your patch and try to add such feature to the current patch.
It provides a valuable hint to estimate whether a SQL query has a chance to
complete and will not reached the resource limits.
.

>
> 1. https://www.postgresql.org/message-id/24182.1472745492%40sss.pgh.pa.us
> 2. https://github.com/postgrespro/pg_query_state/blob/master/
> runtime_explain.patch
>
> Thanks for your suggestions and comments

Regards
Remi

>
> --
> Maksim Milyutin
> Postgres Professional: http://www.postgrespro.com
> Russian Postgres Company
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-04-19 14:15:31 Re: OK, so culicidae is *still* broken
Previous Message Victor Yegorov 2017-04-19 14:07:49 Re: Relation cache invalidation on replica