Waits monitoring

From: Ildus Kurbangaliev <i(dot)kurbangaliev(at)postgrespro(dot)ru>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Waits monitoring
Date: 2015-07-08 15:52:09
Message-ID: 559D4729.9080704@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello.

Currently, PostgreSQL offers many metrics for monitoring. However, detailed
monitoring of waits is still not supported yet. Such monitoring would
let dba know how long backend waited for particular event and therefore
identify
bottlenecks. This functionality is very useful, especially for highload
databases. Metric for waits monitoring are provided by many popular
commercial
DBMS. We currently have requests of this feature from companies migrating to
PostgreSQL from commercial DBMS. Thus, I think it would be nice for
PostgreSQL
to have it too.

Main problem of monitoring waits is that waits could be very short and it's
hard to implement monitoring so that it introduce very low overhead.
For instance, there were couple of tries to implement LWLocks monitoring for
PostgreSQL:
http://www.postgresql.org/message-id/flat/CAG95seUg-qxqzYmwtk6wGg8HFezUp3d6c+AZ4m_QZD+y+bF3zA(at)mail(dot)gmail(dot)com
http://www.postgresql.org/message-id/flat/4FE8CA2C(dot)3030809(at)uptime(dot)jp#4FE8CA2C(dot)3030809@uptime.jp

Attached patch implements waits monitoring for PostgreSQL. Following of
monitoring was implemented:

1) History of waits (by sampling)
2) Waits profiling
3) Trace backend waits to file

Monitored waits:

1) HW Locks (lock.c)
2) LW Locks (lwlock.c)
3) IO (md.c)
3) Network (be-secure.c)
5) Latch (pg_latch.c)

Two new GUC variables added:

* waits_monitoring = on/off - turn on/off all monitoring
* waits_flush_period = on/off - defines period in milliseconds, after that
backend local waits will be flushed to shared memory

## Profiling

Implementation: before some wait each process calls function
`StartWait`. This
function saves wait's data and remembers current time. After the wait
process
calls `StopWait` that increases count, calculates time of wait and saves
this
info to backend's local memory. Every `waits_flush_period` milliseconds
collected data from local memory will be flushed to shared memory.

In Unix-systems `gettimeofday` function is used that gives enough accuracy
for measuring wait times in microseconds.

`pg_stat_wait_get_profile` view show collected information from shared
memory.
At this all functions implemented in `pg_stat_wait` extension. In the future
maybe it has the point to move them to base codebase.

I used atomic functions to avoid locks in the backends. By using TAS
operation
backend can skip writing to shared memory, when it's reading by some query,
so user always gets consistent data.

I also did little refactoring in lwlock.c for lwlocks grouping support.
Now LWLock contains `group` field that used for lwlock identification .
Array `main_lwlock_groups` contains offsets of each group in main tranche.

## Waits history

Implementation: collector background worker cycle through the list of
processes
(every `history_period` ms) and stores current timestamp and waits info
to history.

View `pg_stat_wait_history` can be used for reading the history.

Sampling of history uses double buffering. Backend has two
blocks for current waits, and when collector reads from one, backend writes
to other. Like any sampling it can skip some waits, but by using this method
backends and collector does not require locks and don't block each other.

History GUC parameters:

* shared_preload_libraries = 'pg_stat_wait.so' - for background worker that
will be sample waits.
* pg_stat_wait.history = on/off - turn on/off history recording
* pg_stat_wait.history_size = how many records keep in history
* pg_stat_wait.history_period - period in millseconds between the sampling

## Views

* pg_stat_wait_history - waits history
* pg_stat_wait_profile - profile
* pg_stat_wait_current - current waits
* pg_wait_events - full list of class and event of waits

## Functions

* pg_wait_class_list() - returns wait classes
* pg_wait_event_list() - returns wait events
* pg_stat_wait_get_history() - returns history
* pg_stat_wait_get_profile(backend_pid int4, reset bool) - returns
current profile,
`reset` parameter can be used for resetting data
* pg_stat_wait_get_current(backend_pid int4) - current waits
* pg_stat_wait_reset_profile() - resets profile

If `backend_pid` is NULL, these functions returns information about all
processes.

### Trace functions

* pg_start_trace(backend_pid int4, filename cstring) - start waits
timing to file
* pg_is_in_trace(backend_pid int4) - returns true if tracing is on in
process
* pg_stop_trace(backend_pid int4) - stops waits tracing to file

## Testing

I tested patch mostly on Linux and FreeBSD (Intel processors).
PostgreSQL support very wide
variety of platforms and OS. I hope community will help me with testing
on other platforms.

Configuration: Intel(R) Xeon(R) CPU X5675(at)3(dot)07GHz, 24 cores, pgbench
initialized
with -S 200, fsync off, database on tmpfs. I got stable results only in
select queries. With update queries results variety between runs is too
wide to
measure overhead of monitoring. Here is sample results:

Monitoring off:
[ildus(at)1-i-kurbangaliev postgrespro]$ pgbench -S b1 -c 96 -j 4 -T 300
starting vacuum...end.
transaction type: SELECT only
scaling factor: 500
query mode: simple
number of clients: 96
number of threads: 4
duration: 300 s
number of transactions actually processed: 39316058
latency average: 0.733 ms
tps = 131052.608425 (including connections establishing)
tps = 131076.202085 (excluding connections establishing)
[ildus(at)1-i-kurbangaliev postgrespro]$
Monitoring on:
[ildus(at)1-i-kurbangaliev postgrespro]$ pgbench -S b1 -c 96 -j 4 -T 300
starting vacuum...end.
transaction type: SELECT only
scaling factor: 500
query mode: simple
number of clients: 96
number of threads: 4
duration: 300 s
number of transactions actually processed: 39172607
latency average: 0.735 ms
tps = 130574.626755 (including connections establishing)
tps = 130600.767440 (excluding connections establishing)

According to this tests overhead is less than 1%

Monitoring results look like this:

1) select * from pg_stat_wait_profile;
pid | class_id | class_name | event_id | event_name |
wait_time | wait_count
-------+----------+------------+----------+--------------------------+-----------+------------
26393 | 1 | LWLocks | 0 | BufFreelistLock |
73 | 3
26396 | 1 | LWLocks | 0 | BufFreelistLock |
259 | 1
26395 | 1 | LWLocks | 0 | BufFreelistLock |
150 | 2
26394 | 1 | LWLocks | 0 | BufFreelistLock |
12 | 1
26394 | 1 | LWLocks | 7 | WALBufMappingLock |
391 | 2
26395 | 1 | LWLocks | 38 | BufferPartitionLock |
17 | 1
26393 | 1 | LWLocks | 39 | LockManagerPartitionLock |
30 | 3
26395 | 1 | LWLocks | 39 | LockManagerPartitionLock |
14 | 1
26392 | 1 | LWLocks | 41 | SharedBufferLocks |
2687 | 9
26393 | 1 | LWLocks | 41 | SharedBufferLocks |
1633 | 10
26396 | 1 | LWLocks | 41 | SharedBufferLocks |
3154 | 16
26394 | 1 | LWLocks | 41 | SharedBufferLocks |
1597 | 11
26395 | 1 | LWLocks | 41 | SharedBufferLocks |
2936 | 18
26394 | 2 | Locks | 4 | Transaction |
457 | 1
26393 | 2 | Locks | 4 | Transaction |
554 | 1
26395 | 3 | Storage | 0 | READ |
7292 | 60
26391 | 3 | Storage | 0 | READ |
36604 | 1526
26396 | 3 | Storage | 0 | READ |
6512 | 67
26392 | 3 | Storage | 0 | READ |
7411 | 62
26344 | 3 | Storage | 0 | READ |
560 | 100
26393 | 3 | Storage | 0 | READ |
8227 | 79
26394 | 3 | Storage | 0 | READ |
8385 | 73
26342 | 3 | Storage | 0 | READ |
9 | 1
26340 | 4 | Latch | 0 | Latch |
51270008 | 256
26341 | 4 | Latch | 0 | Latch |
49850661 | 57
26394 | 5 | Network | 0 | READ |
7660 | 71
2) select * from pg_stat_wait_current;
pid | sample_ts | class_id | class_name |
event_id | event_name | wait_time | p1 | p2 | p3 | p4 | p5
-------+-------------------------------+----------+------------+----------+-------------------+-----------+------+-------+-------+----+-------
26437 | 2015-06-25 11:00:13.561607-04 | 1 | LWLocks |
11 | CLogControlLock | 1002935 | 1 | 0 | 0 | 0 | 0
26438 | 2015-06-25 11:00:13.561607-04 | 1 | LWLocks |
41 | SharedBufferLocks | 1002970 | 1 | 0 | 0 | 0 | 0
26440 | 2015-06-25 11:00:13.561607-04 | 3 | Storage |
0 | READ | 1001948 | 1663 | 16384 | 49759 | 0 | 9821
26340 | 2015-06-25 11:00:13.561607-04 | 3 | Storage |
1 | WRITE | 995053 | 1663 | 16384 | 49752 | 0 | 53093
26341 | 2015-06-25 11:00:13.561607-04 | 4 | Latch |
0 | Latch | 986350 | 0 | 0 | 0 | 0 | 0
26339 | 2015-06-25 11:00:13.561607-04 | 4 | Latch |
0 | Latch | 869452 | 0 | 0 | 0 | 0 | 0
26439 | 2015-06-25 11:00:13.561607-04 | 5 | Network |
1 | WRITE | 1002672 | 0 | 0 | 0 | 0 | 0
3) select * from pg_stat_wait_history;
pid | sample_ts | class_id | class_name |
event_id | event_name | wait_time | p1 | p2 | p3 | p4 | p5
-------+-------------------------------+----------+------------+----------+------------+-----------+------+-------+-------+----+----
27564 | 2015-06-25 11:35:29.331803-04 | 3 | Storage |
0 | READ | 19855 | 1664 | 0 | 12171 | 0 | 0
27568 | 2015-06-25 11:36:29.351151-04 | 3 | Storage |
0 | READ | 2216 | 1664 | 0 | 12173 | 0 | 0
27567 | 2015-06-25 11:35:32.493138-04 | 3 | Storage |
0 | READ | 17818 | 1664 | 0 | 11917 | 0 | 0
27567 | 2015-06-25 11:35:33.671287-04 | 3 | Storage |
0 | READ | 5700 | 1663 | 16384 | 12036 | 0 | 0
27567 | 2015-06-25 11:35:56.731703-04 | 5 | Network |
1 | WRITE | 18339 | 0 | 0 | 0 | 0 | 0

--
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
waits_monitoring.patch text/x-patch 85.2 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joe Conway 2015-07-08 16:21:56 Re: dblink: add polymorphic functions.
Previous Message Corey Huinker 2015-07-08 15:51:26 Re: dblink: add polymorphic functions.