Skip site navigation (1) Skip section navigation (2)

Waits monitoring

From: Ildus Kurbangaliev <i(dot)kurbangaliev(at)postgrespro(dot)ru>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Waits monitoring
Date: 2015-07-08 15:52:09
Message-ID: 559D4729.9080704@postgrespro.ru (view raw, whole thread or download thread mbox)
Thread:
Lists: pgsql-hackers
Hello.

Currently, PostgreSQL offers many metrics for monitoring. However, detailed
monitoring of waits is still not supported yet. Such monitoring would
let dba know how long backend waited for particular event and therefore 
identify
bottlenecks. This functionality is very useful, especially for highload
databases. Metric for waits monitoring are provided by many popular 
commercial
DBMS. We currently have requests of this feature from companies migrating to
PostgreSQL from commercial DBMS. Thus, I think it would be nice for 
PostgreSQL
to have it too.

Main problem of monitoring waits is that waits could be very short and it's
hard to implement monitoring so that it introduce very low overhead.
For instance, there were couple of tries to implement LWLocks monitoring for
PostgreSQL:
http://www.postgresql.org/message-id/flat/CAG95seUg-qxqzYmwtk6wGg8HFezUp3d6c+AZ4m_QZD+y+bF3zA(at)mail(dot)gmail(dot)com
http://www.postgresql.org/message-id/flat/4FE8CA2C(dot)3030809(at)uptime(dot)jp#4FE8CA2C(dot)3030809@uptime.jp

Attached patch implements waits monitoring for PostgreSQL. Following of 
monitoring was implemented:

1) History of waits (by sampling)
2) Waits profiling
3) Trace backend waits to file

Monitored waits:

1) HW Locks (lock.c)
2) LW Locks (lwlock.c)
3) IO (md.c)
3) Network (be-secure.c)
5) Latch (pg_latch.c)

Two new GUC variables added:

* waits_monitoring = on/off - turn on/off all monitoring
* waits_flush_period = on/off - defines period in milliseconds, after that
backend local waits will be flushed to shared memory

## Profiling

Implementation: before some wait each process calls function 
`StartWait`. This
function saves wait's data and remembers current time. After the wait 
process
calls `StopWait` that increases count, calculates time of wait and saves 
this
info to backend's local memory. Every `waits_flush_period` milliseconds
collected data from local memory will be flushed to shared memory.

In Unix-systems `gettimeofday` function is used that gives enough accuracy
for measuring wait times in microseconds.

`pg_stat_wait_get_profile` view show collected information from shared 
memory.
At this all functions implemented in `pg_stat_wait` extension. In the future
maybe it has the point to move them to base codebase.

I used atomic functions to avoid locks in the backends. By using TAS 
operation
backend can skip writing to shared memory, when it's reading by some query,
so user always gets consistent data.

I also did little refactoring in lwlock.c for lwlocks grouping support.
Now LWLock contains `group` field that used for lwlock identification .
Array `main_lwlock_groups` contains offsets of each group in main tranche.

## Waits history

Implementation: collector background worker cycle through the list of 
processes
(every `history_period` ms) and stores current timestamp and waits info 
to history.

View `pg_stat_wait_history` can be used for reading the history.

Sampling of history uses double buffering. Backend has two
blocks for current waits, and when collector reads from one, backend writes
to other. Like any sampling it can skip some waits, but by using this method
backends and collector does not require locks and don't block each other.

History GUC parameters:

* shared_preload_libraries = 'pg_stat_wait.so' - for background worker that
will be sample waits.
* pg_stat_wait.history = on/off - turn on/off history recording
* pg_stat_wait.history_size = how many records keep in history
* pg_stat_wait.history_period - period in millseconds between the sampling

## Views

* pg_stat_wait_history - waits history
* pg_stat_wait_profile - profile
* pg_stat_wait_current - current waits
* pg_wait_events - full list of class and event of waits

## Functions

* pg_wait_class_list() - returns wait classes
* pg_wait_event_list() - returns wait events
* pg_stat_wait_get_history() - returns history
* pg_stat_wait_get_profile(backend_pid int4, reset bool) - returns 
current profile,
`reset` parameter can be used for resetting data
* pg_stat_wait_get_current(backend_pid int4) - current waits
* pg_stat_wait_reset_profile() - resets profile

If `backend_pid` is NULL, these functions returns information about all
processes.

### Trace functions

* pg_start_trace(backend_pid int4, filename cstring) - start waits 
timing to file
* pg_is_in_trace(backend_pid int4) - returns true if tracing is on in 
process
* pg_stop_trace(backend_pid int4) - stops waits tracing to file

## Testing

I tested patch mostly on Linux and FreeBSD (Intel processors). 
PostgreSQL support very wide
variety of platforms and OS. I hope community will help me with testing 
on other platforms.

Configuration: Intel(R) Xeon(R) CPU X5675(at)3(dot)07GHz, 24 cores, pgbench 
initialized
with -S 200, fsync off, database on tmpfs. I got stable results only in
select queries. With update queries results variety between runs is too 
wide to
measure overhead of monitoring. Here is sample results:

Monitoring off:
[ildus(at)1-i-kurbangaliev postgrespro]$ pgbench -S b1 -c 96 -j 4 -T 300
starting vacuum...end.
transaction type: SELECT only
scaling factor: 500
query mode: simple
number of clients: 96
number of threads: 4
duration: 300 s
number of transactions actually processed: 39316058
latency average: 0.733 ms
tps = 131052.608425 (including connections establishing)
tps = 131076.202085 (excluding connections establishing)
[ildus(at)1-i-kurbangaliev postgrespro]$
Monitoring on:
[ildus(at)1-i-kurbangaliev postgrespro]$ pgbench -S b1 -c 96 -j 4 -T 300
starting vacuum...end.
transaction type: SELECT only
scaling factor: 500
query mode: simple
number of clients: 96
number of threads: 4
duration: 300 s
number of transactions actually processed: 39172607
latency average: 0.735 ms
tps = 130574.626755 (including connections establishing)
tps = 130600.767440 (excluding connections establishing)

According to this tests overhead is less than 1%

Monitoring results look like this:

1) select * from pg_stat_wait_profile;
   pid  | class_id | class_name | event_id | event_name        | 
wait_time | wait_count
-------+----------+------------+----------+--------------------------+-----------+------------
  26393 |        1 | LWLocks  |        0 | BufFreelistLock          |   
      73 |          3
  26396 |        1 | LWLocks  |        0 | BufFreelistLock          |   
     259 |          1
  26395 |        1 | LWLocks  |        0 | BufFreelistLock          |   
     150 |          2
  26394 |        1 | LWLocks  |        0 | BufFreelistLock          |   
      12 |          1
  26394 |        1 | LWLocks  |        7 | WALBufMappingLock        |   
     391 |          2
  26395 |        1 | LWLocks  |       38 | BufferPartitionLock      |   
      17 |          1
  26393 |        1 | LWLocks  |       39 | LockManagerPartitionLock |   
      30 |          3
  26395 |        1 | LWLocks  |       39 | LockManagerPartitionLock |   
      14 |          1
  26392 |        1 | LWLocks  |       41 | SharedBufferLocks        |   
    2687 |          9
  26393 |        1 | LWLocks  |       41 | SharedBufferLocks        |   
    1633 |         10
  26396 |        1 | LWLocks  |       41 | SharedBufferLocks        |   
    3154 |         16
  26394 |        1 | LWLocks  |       41 | SharedBufferLocks        |   
    1597 |         11
  26395 |        1 | LWLocks  |       41 | SharedBufferLocks        |   
    2936 |         18
  26394 |        2 | Locks  |        4 | Transaction              |     
   457 |          1
  26393 |        2 | Locks  |        4 | Transaction              |     
   554 |          1
  26395 |        3 | Storage  |        0 | READ                     |   
    7292 |         60
  26391 |        3 | Storage  |        0 | READ                     |   
   36604 |       1526
  26396 |        3 | Storage  |        0 | READ                     |   
    6512 |         67
  26392 |        3 | Storage  |        0 | READ                     |   
    7411 |         62
  26344 |        3 | Storage  |        0 | READ                     |   
     560 |        100
  26393 |        3 | Storage  |        0 | READ                     |   
    8227 |         79
  26394 |        3 | Storage  |        0 | READ                     |   
    8385 |         73
  26342 |        3 | Storage  |        0 | READ                     |   
       9 |          1
  26340 |        4 | Latch  |        0 | Latch                    | 
  51270008 |        256
  26341 |        4 | Latch  |        0 | Latch                    | 
  49850661 |         57
  26394 |        5 | Network  |        0 | READ                     |   
    7660 |         71
2) select * from pg_stat_wait_current;
   pid  |           sample_ts           | class_id | class_name | 
event_id |    event_name | wait_time |  p1  |  p2   |  p3   | p4 |  p5
-------+-------------------------------+----------+------------+----------+-------------------+-----------+------+-------+-------+----+-------
  26437 | 2015-06-25 11:00:13.561607-04 |        1 | LWLocks    |       
11 | CLogControlLock   |   1002935 |    1 |     0 |     0 |  0 | 0
  26438 | 2015-06-25 11:00:13.561607-04 |        1 | LWLocks    |       
41 | SharedBufferLocks |   1002970 |    1 |     0 |     0 |  0 | 0
  26440 | 2015-06-25 11:00:13.561607-04 |        3 | Storage    |       
  0 | READ          |   1001948 | 1663 | 16384 | 49759 |  0 |  9821
  26340 | 2015-06-25 11:00:13.561607-04 |        3 | Storage    |       
  1 | WRITE           |    995053 | 1663 | 16384 | 49752 |  0 | 53093
  26341 | 2015-06-25 11:00:13.561607-04 |        4 | Latch      |       
  0 | Latch           |    986350 |    0 |     0 |     0 |  0 |     0
  26339 | 2015-06-25 11:00:13.561607-04 |        4 | Latch      |       
  0 | Latch           |    869452 |    0 |     0 |     0 |  0 |     0
  26439 | 2015-06-25 11:00:13.561607-04 |        5 | Network    |       
  1 | WRITE           |   1002672 |    0 |     0 |     0 |  0 |     0
3) select * from pg_stat_wait_history;
   pid  |           sample_ts           | class_id | class_name | 
event_id | event_name | wait_time |  p1  |  p2   |  p3   | p4 | p5
-------+-------------------------------+----------+------------+----------+------------+-----------+------+-------+-------+----+----
  27564 | 2015-06-25 11:35:29.331803-04 |        3 | Storage    |       
  0 | READ   |     19855 | 1664 |     0 | 12171 |  0 |  0
  27568 | 2015-06-25 11:36:29.351151-04 |        3 | Storage    |       
  0 | READ   |      2216 | 1664 |     0 | 12173 |  0 |  0
  27567 | 2015-06-25 11:35:32.493138-04 |        3 | Storage    |       
  0 | READ   |     17818 | 1664 |     0 | 11917 |  0 |  0
  27567 | 2015-06-25 11:35:33.671287-04 |        3 | Storage    |       
  0 | READ   |      5700 | 1663 | 16384 | 12036 |  0 |  0
  27567 | 2015-06-25 11:35:56.731703-04 |        5 | Network    |       
  1 | WRITE    |     18339 |    0 |     0 |     0 |  0 |  0

-- 
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Attachment: waits_monitoring.patch
Description: text/x-patch (85.2 KB)

Responses

pgsql-hackers by date

Next:From: Joe ConwayDate: 2015-07-08 16:21:56
Subject: Re: dblink: add polymorphic functions.
Previous:From: Corey HuinkerDate: 2015-07-08 15:51:26
Subject: Re: dblink: add polymorphic functions.

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group