|From:||Remi Colinet <remi(dot)colinet(at)gmail(dot)com>|
|Subject:||[PATCH v2] Progress command to monitor progression of long running SQL queries|
|Views:||Raw Message | Whole Thread | Download mbox|
This is version 2 of the new command name PROGRESS which I wrote in order
to monitor long running SQL queries in a Postgres backend process.
New command justification
The purpose of the command is to monitor long running SQL queries in a
backend process to evaluate when they will complete.
First, the user needs to find the pid of the backend process which is
executing the long running SQL query for which we want to monitor
This search may be based on the SQL query, the session, the terminal, using
the view pg_stat_activity. This view provides the best method to identify
the query to be monitored.
Once the pid of the backend is determined, user may run:
psql> PROGRESS [VERBOSE|(FORMAT TEXT|JSON|XML|INLINE)] PID_OF_BACKEND
The output will show a table similar to the one of the EXPLAIN command but
with further details about the progression of execution of each node which
may take a long time to be executed.
Such nodes are SeqScan (Sequential Scans), IndexScan (Index Scans), Sort
(long sorts which require disk tapes mecanism), TupleStore (Store on disks
of tuple tables), Limit, HashJoinTable which use one disk files to perform
Other nodes use the previous nodes. For instance, Material nodes use
The PROGRESS command can be used with the psql \watch command. For
instance, user may run:
psql> \watch PROGRESS PID_OF_BACKEND
This is handy to view progression of SQL queries.
A use case is shown in the below example based on a table named t_10m with
at least 10 millions rows.
The table has been created with :
CREATE TABLE T_10M ( id integer, md5 text);
INSERT INTO T_10M SELECT generate_series(1,10000000) AS id,
md5(random()::text) AS md5;
ANALYSE is needed to have statistics updated. These are used to compare
rows fetched with total number of rows and report percentage of execution
of query nodes.
If ANALYSE is not run against relations used by the monitored query,
statistics may be inaccurate.
1/ Start a first psql session to run a long SQL query:
[pgadm(at)rco ~]$ psql -A -d test
Type "help" for help.
The option -A is used to allow rows to be output without formatting work.
Redirect output to a file in order to let the query run without terminal
test=# \o out
Start a long running query:
test=# select * from t_10M order by md5;
2/ In a second psql session, list the backend pid(s) and their matching SQL
[pgadm(at)rco ~]$ psql -d test
Type "help" for help.
test=# select * from prg;
pid | query
26311 | select * from t_10m order by md5;
26312 | select * from t_10m order by md5;
26313 | select * from t_10m order by md5;
26330 | select * from prg;
Chose the pid of the backend running the long SQL query to be monitored.
Above example is a parallel SQL query. Lowest pid is the main backend of
test=# PROGRESS 26311;
status: <query running>
query: select * from t_10m order by md5;
time used (s): 20
worker child: 26312
-> Sort => rows r/w merge 0/0 sort 0/1895721 0% tape blocks 10903
Sort Key: md5
-> Parallel Seq Scan on t_10m => rows 1903441/4166657 45% => blks
worker child: 26313
-> Sort => rows r/w merge 0/0 sort 0/1967521 0% tape blocks 11314
Sort Key: md5
-> Parallel Seq Scan on t_10m => rows 1975561/4166657 47% => blks
worker parent: 26311
-> Sort on tapes writing => rows r/w merge 0/0 sort 0/2075590 0% tape
Sort Key: md5
-> Parallel Seq Scan on t_10m => rows 2111550/4166657 50% => blks
total disk space used (MB) 266
The query of the monitored backend is:
test=# select * from t_10M order by md5;
Because the table has 10 millions of rows, the sort is done on tapes.
The output shows the progression in terms of:
- rows with:
- already fetched rows,
- total rows to be fetched,
- blocks with:
- already fetched blocks
- total blocks to be fetched
Each node may have different output types.
Progression is reported in terms of rows, blocks, bytes, and percentages.
Sort nodes show tapes blocks being used and step phase (merge/sort).
Implementation of the command
The design of the command is:
- the user issue the "PROGRESS pid" command from a psql session.
The pid is the one of the backend which runs the SQL query for which we
want to get a progression report. It can be determined from the view
- the monitoring backend, upon receiving the "PROGRESS pid" command from
psql utility used in step above, sends a signal to the backend whose
process pid is the one provided in the PROGRESS command, and whose reason
is PROCSIG_PROGRESS defined in src/include/storage/procsignal.h
- the monitored backend receives the signal with the reason of the signal
(progression request) and notes the request as for any interrupt. Then, it
continues its execution of its SQL query until interrupts can be serviced.
- when the monitored process can service the interrupts, it deals with the
progress request by collecting its execution tree with the execution
progress of each long running node. At this time, the SQL query must be in
a consistent state without partial nodes under alllocation or freeing. This
is enforced by flags added in the executor. The progression is only
collected if the backend is in the function of ExecutorRun(). If execution
tree is in a consistent state, it is dumped in shared memory pages
allocated at the start of the server. Then, the monitored backend set a
latch on which the monitoring backend is waiting for. It then continues
executing its SQL query.
- the monitoring backend collects the share memory data dumped, and sends
it to its psql session as a list of rows.
The command PROGRESS does not incur any slowness on the running query
because the execution progression is only computed upon receiving the
progress request which is supposed to be seldom used.
The code heavily reuses the one of the explain command. In order to share
as much code as possible with the EXPLAIN command, part of the EXPLAIN code
which deals with reporting quals for instance, has been moved to a new
report.c file in the src/backend/commands folder. This code in report.c is
shared between explain.c source code and PROGRESS command source code which
is in progress.c file.
The progression reported by PROGRESS command is given in terms of rows,
blocks, bytes and percents. The values displayed depend on the node type in
the execution plan. The current patch implements all the possible nodes
which could take a lot of time.
Parallel queries can also be monitored. The same mecanism is used to
monitor child workers with a slight difference: the main worker requests
the child progression directly in order to dump the whole progress tree in
The patch for version 2 is attached to the current email.
The patch has been rebased on latest commits.
The patch is also available at : https://github.com/colinet/
The diff stat of the patch is:
[root(at)rco pg]# git diff --stat master..
contrib/auto_explain/auto_explain.c | 5 +-
contrib/postgres_fdw/postgres_fdw.c | 13 +-
src/backend/access/heap/heapam.c | 2 +
src/backend/access/transam/parallel.c | 2 +-
src/backend/commands/Makefile | 3 +-
src/backend/commands/explain.c | 2834
src/backend/commands/prepare.c | 5 +-
src/backend/commands/progress.c | 2035
src/backend/commands/report.c | 2252
src/backend/executor/execMain.c | 8 +
src/backend/executor/execProcnode.c | 31 ++
src/backend/executor/nodeBitmapHeapscan.c | 13 +-
src/backend/executor/nodeIndexonlyscan.c | 13 +-
src/backend/executor/nodeIndexscan.c | 15 +-
src/backend/executor/nodeSamplescan.c | 12 +-
src/backend/executor/nodeSeqscan.c | 16 +-
src/backend/nodes/bitmapset.c | 19 +
src/backend/nodes/outfuncs.c | 245 ++++++++++
src/backend/parser/gram.y | 79 +++-
src/backend/postmaster/postmaster.c | 1 +
src/backend/storage/file/buffile.c | 73 +++
src/backend/storage/file/fd.c | 15 +
src/backend/storage/ipc/ipci.c | 3 +
src/backend/storage/ipc/procarray.c | 57 +++
src/backend/storage/ipc/procsignal.c | 4 +
src/backend/storage/lmgr/lwlock.c | 7 +-
src/backend/storage/lmgr/lwlocknames.txt | 1 +
src/backend/tcop/postgres.c | 65 ++-
src/backend/tcop/pquery.c | 25 +
src/backend/tcop/utility.c | 10 +
src/backend/utils/init/globals.c | 12 +
src/backend/utils/sort/logtape.c | 18 +
src/backend/utils/sort/tuplesort.c | 153 +++++-
src/backend/utils/sort/tuplestore.c | 75 ++-
src/include/access/parallel.h | 1 +
src/include/commands/explain.h | 67 +--
src/include/commands/prepare.h | 2 +-
src/include/commands/report.h | 159 +++++++
src/include/executor/execdesc.h | 3 +
src/include/executor/progress.h | 52 ++
src/include/foreign/fdwapi.h | 10 +-
src/include/miscadmin.h | 2 +
src/include/nodes/bitmapset.h | 1 +
src/include/nodes/execnodes.h | 3 +
src/include/nodes/extensible.h | 6 +-
src/include/nodes/nodes.h | 8 +
src/include/nodes/parsenodes.h | 11 +
src/include/nodes/plannodes.h | 11 +
src/include/parser/kwlist.h | 2 +
src/include/pgstat.h | 3 +-
src/include/storage/buffile.h | 11 +
src/include/storage/fd.h | 2 +
src/include/storage/procarray.h | 3 +
src/include/storage/procsignal.h | 3 +
src/include/tcop/pquery.h | 1 +
src/include/utils/logtape.h | 2 +
src/include/utils/tuplesort.h | 72 ++-
src/include/utils/tuplestore.h | 35 ++
58 files changed, 5972 insertions(+), 2619 deletions(-)
The progress command can be used with the watch command of psql making it
more handy to monitor a long running query.
The default format of the PROGRESS command is text inline. It can be set as
json, xml, or text as for EXPLAIN command.
Some further examples of use are shown below in the test_v2.txt file with
parallel queries, sorts, hashes, scans, json format output.
Monitor progress of utilities commands such as CREATE INDEX which may take
a long time to complete.
Added JSON, XML, TEXT INLINE format output
Added time consumed to run SQL queries
Added SQL query being monitored
Added support for parallel queries
Added disk used by the queries
Rebased on 9th May 2017 commits.
Initial version with only TEXT format reporting
Any suggestion, comment or feedback is welcome.
|Next Message||Robert Haas||2017-05-10 16:43:20||Re: [POC] hash partitioning|
|Previous Message||Robert Haas||2017-05-10 16:38:54||Re: [POC] hash partitioning|