Parallel index creation & pg_stat_activity

From: Phil Florent <philflorent(at)hotmail(dot)com>
To: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Parallel index creation & pg_stat_activity
Date: 2018-02-28 16:50:44
Message-ID: HE1PR03MB17065FB8B627396DD0DAF543BAC70@HE1PR03MB1706.eurprd03.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Hi,
I created an index on a 11devel base while sampling pg_stat_activity with a little tool. Tool catches a line if state = active. Collected rows are aggregated and sorted by activity percentage.

Test environment :

select version();
version
----------------------------------------------------------------------------------------------------------------------------------------------------
PostgreSQL 11devel (Debian 11~~devel~20180227.2330-1~420.git51057fe.pgdg+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 7.3.0-5) 7.3.0, 64-bit
(1 ligne)

Temps : 0,762 ms

create table t1(c1 bigint, c2 double precision, c3 text);
CREATE TABLE

insert into t1 select generate_series(1,100000000,1), random(), md5(random()::text) ;
INSERT 0 100000000

With a select (select max(c1) from t1 group by c2;) I have this kind of output :

./t -d 20 -o "pid, backend_type, query, wait_event_type, wait_event"
traqueur 2.04.00 - performance tool for PostgreSQL 9.3 => 11
INFORMATION, no connection parameters provided, connecting to traqueur database ...
INFORMATION, connected to traqueur database
INFORMATION, PostgreSQL version : 110000
INFORMATION, sql preparation ...
INFORMATION, sql execution ...
busy_pc | distinct_exe | pid | backend_type | query | wait_event_type | wait_event
---------+--------------+-------+-----------------+-------------------------------------+-----------------+--------------
76 | 1 / 151 | 10065 | parallel worker | select max(c1) from t1 group by c2; | IO | DataFileRead
73 | 1 / 146 | 8262 | client backend | select max(c1) from t1 group by c2; | IO | DataFileRead
72 | 1 / 144 | 10066 | parallel worker | select max(c1) from t1 group by c2; | IO | DataFileRead
26 | 1 / 53 | 10066 | parallel worker | select max(c1) from t1 group by c2; | |
26 | 1 / 51 | 8262 | client backend | select max(c1) from t1 group by c2; | |
24 | 1 / 47 | 10065 | parallel worker | select max(c1) from t1 group by c2; | |
2 | 1 / 3 | 10066 | parallel worker | select max(c1) from t1 group by c2; | IO | BufFileWrite
2 | 1 / 3 | 8262 | client backend | select max(c1) from t1 group by c2; | IO | BufFileWrite
1 | 1 / 2 | 10065 | parallel worker | select max(c1) from t1 group by c2; | IO | BufFileWrite

With an index creation (create index t1_i1 on t1(c1, c2);) I have this kind of output :

./t -d 20 -o "pid, backend_type, query, wait_event_type, wait_event"
traqueur 2.04.00 - performance tool for PostgreSQL 9.3 => 11
INFORMATION, no connection parameters provided, connecting to traqueur database ...
INFORMATION, connected to traqueur database
INFORMATION, PostgreSQL version : 110000
INFORMATION, sql preparation ...
INFORMATION, sql execution ...
busy_pc | distinct_exe | pid | backend_type | query | wait_event_type | wait_event
---------+--------------+------+----------------+-----------------------------------+-----------------+--------------
68 | 1 / 136 | 8262 | client backend | create index t1_i1 on t1(c1, c2); | IO | DataFileRead
26 | 1 / 53 | 8262 | client backend | create index t1_i1 on t1(c1, c2); | |
6 | 1 / 11 | 8262 | client backend | create index t1_i1 on t1(c1, c2); | IO | BufFileWrite
(3 rows)

No parallel worker. At least one parallel worker was active though, I could see its work with a direct query on pg_stat_activity or a ps -ef :

...
postgres 8262 8230 7 08:54 ? 00:22:46 postgres: 11/main: postgres postgres [local] CREATE INDEX
...
postgres 9833 8230 23 14:17 ? 00:00:33 postgres: 11/main: parallel worker for PID 8262
...

Tool only catches activity of the client backend cause column state of pg_stat_activity is null for the parallel workers in this case. I added an option to do a "(state = 'active' or wait_event_is not null)" It's not 100% accurate though : I miss the activity of the parallel workers which is not waiting and it’s more difficult to know who helps whom since query is also null.
I can imagine various workarounds but 11 is in devel and maybe columns active & query of pg_stat_activity will be filled for the parallel workers even for an index creation ?

Best regards
Phil

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Andres Freund 2018-02-28 17:44:52 Re: Parallel index creation & pg_stat_activity
Previous Message David G. Johnston 2018-02-28 14:14:32 Re: Enforce primary key on every table during dev?

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2018-02-28 16:53:56 Re: [HACKERS] MERGE SQL Statement for PG11
Previous Message Tomas Vondra 2018-02-28 16:49:46 Re: [HACKERS] user-defined numeric data types triggering ERROR: unsupported type