Hi all, Recently I noticed an interesting thread about table scheduling in
autovacuum in [0], which contained some constructive discussions. After testing
nathan's patch, I attempted to make improvements on some issues mentioned in
the discussion, mainly related to cost delay. Cost delay was originally
introduced to prevent autovacuum workers from consuming excessive resources and
impacting normal workloads. However, this value tends to be too low when system
load is intense and too high when system is idle. The initial idea is
straightforward: we can automatically adjust cost delay based on system
load. Currently, I am primarily considering IO resources (Maybe CPU and other
resources in the future) and have implemented the following mechanisms:
1.IO Load Monitoring: Tracks IO wait events across all backend processes with
weighted contributions (IO-sensitive backends contribute more weight than others).
2.Dynamic Mode Selection: Introduce three modes based on IO wait thresholds:
FAST mode: When IO load is low (few of processes in IO wait)
SLOW mode: When IO load is high (lots of processes in IO wait)
NORMAL mode: Otherwise
3.Adaptive Delay Adjustment: Modifies vacuum delay at vacuum_delay_point(), Normal
Mode use guc configured delay, skip delays in FAST mode and add extra delays in
SLOW mode based on detected IO contention.
I tried using Sami's scripts in[0] to perform some tests. I added a debug GUC
parameter 'debug_autovacuum_adaptive_cost_delay'for testing, with the results
displayed in result-analysis.txt. My PostgreSQL configuration is as follows:
、、、
max_connections=1000
autovacuum_naptime = '10s'
shared_buffers = '4GB'
autovacuum_max_workers = 6
autovacuum_vacuum_cost_delay =2ms
autovacuum_vacuum_cost_limit =200
、、、
##################################################
## ADAPTIVE OFF
##################################################
relname | reltuples | n_dead_tup | avg_av_time | avg_aa_time
------------------+-----------+------------+-------------+-------------
table_batch_2 | 38100700 | 0 | 63426.75 | 2788.63
table_batch_3 | 34528300 | 0 | 43569.38 | 2752.13
table_batch_1 | 21774900 | 0 | 33058.09 | 2675.13
table_13_4096000 | 4094220 | 1663211 | 0.00 | 12047.00
table_12_2048000 | 2045990 | 1202655 | 688704.00 | 10698.33
table_11_1024000 | 1023520 | 446236 | 337606.00 | 17622.00
table_10_512000 | 510631 | 523053 | 138559.00 | 13200.89
table_9_256000 | 253966 | 338711 | 54128.55 | 9159.77
table_8_128000 | 127072 | 304428 | 17124.50 | 3508.30
table_7_64000 | 63934 | 384090 | 4450.55 | 834.38
table_6_32000 | 31999 | 397000 | 1690.46 | 453.33
table_5_16000 | 16000 | 403389 | 1030.89 | 282.55
table_4_8000 | 8000 | 408596 | 846.87 | 221.53
table_3_4000 | 4000 | 413417 | 881.73 | 224.77
table_16_4000 | 3998 | 424019 | 956.04 | 199.35
table_2_2000 | 1996 | 433390 | 829.55 | 207.11
table_15_2000 | 1988 | 429912 | 868.61 | 216.27
table_14_1000 | 991 | 443569 | 856.38 | 149.44
table_1_1000 | 986 | 438926 | 795.70 | 161.82
############################################
## ADAPTIVE ON
############################################
relname | reltuples | n_dead_tup | avg_av_time | avg_aa_time
------------------+-----------+------------+-------------+-------------
table_batch_3 | 57988400 | 0 | 711.95 | 212.73
table_batch_2 | 51700100 | 0 | 549.80 | 203.18
table_batch_1 | 49688200 | 0 | 574.58 | 206.67
table_13_4096000 | 4096250 | 22439 | 9937.00 | 400.60
table_12_2048000 | 2048270 | 2506 | 4867.50 | 375.00
table_11_1024000 | 1023880 | 211596 | 2351.57 | 389.25
table_10_512000 | 512056 | 23575 | 1024.93 | 352.03
table_9_256000 | 256011 | 42427 | 429.28 | 309.75
table_8_128000 | 127992 | 11739 | 167.56 | 263.62
table_7_64000 | 63998 | 15171 | 69.52 | 221.04
table_6_32000 | 31998 | 5787 | 32.94 | 192.52
table_5_16000 | 16000 | 6769 | 18.11 | 92.08
table_4_8000 | 8000 | 6811 | 10.40 | 41.37
table_3_4000 | 4000 | 7058 | 6.40 | 18.58
table_16_4000 | 4000 | 6786 | 6.32 | 18.55
table_15_2000 | 2000 | 6783 | 4.45 | 8.93
table_2_2000 | 2000 | 6936 | 4.58 | 8.89
table_1_1000 | 1000 | 6798 | 3.66 | 5.50
table_14_1000 | 1000 | 6700 | 3.66 | 5.50
Result shows that av_time decreased significantly, and there is no noticeable
degradation in system performance.
So far, this is just an exploratory patch with some areas that can be improved.
Any constructive feedback and suggestions are welcome.
Thank you all
--
Duan
Alibaba Cloud Computing
[0] https://www.postgresql.org/message-id/flat/aOaAuXREwnPZVISO%40nathan <https://www.postgresql.org/message-id/flat/aOaAuXREwnPZVISO%40nathan >