autovacuum can't keep up, bloat just continues to rise

From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: autovacuum can't keep up, bloat just continues to rise
Date: 2017-07-19 22:11:25
Message-ID: 0265f9e2-3e32-e67d-f106-8abde596c0e4@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

At PGConf US Philly last week I was talking with Jim and Jan about
performance. One of the items that came up is that PostgreSQL can't run
full throttle for long periods of time. The long and short is that no
matter what, autovacuum can't keep up. This is what I have done:

Machine:

16vCPU
59G Memory
10G SSD (/)
500G SSD /srv/main/9.6 (PGDATA) : 240MB Sustained with 15k IOPS
* Yes, we really got 240MB sustained performance

I used benchmarksql which is a tpc-c benchmark similar to pgbench but
supposedly more thorough.

https://sourceforge.net/projects/benchmarksql/

PostgreSQL 9.6 on Ubuntu 16.04 x64.

postgresql.conf:

max_connections: 1000 (just to keep it out of the way)
shared_buffers: 32G (Awesome work Haas)
work_mem: 32M
maintenance_work_mem: 2G
effective_io_concurrency: 1

* Before anybody suggests increasing this, on GCE over a dozen tests,
anything but disabling this appears to be a performance hit of ~ 10% (I
can reproduce those tests if you like on another thread).

synchronous_commit: off
checkpoint_timeout: 60min
max_wal_size: 5G
random_page_cost: 1
effective_cache_size: 32GB
*this probably should be more like 50 but still
autovacuum_max_workers: 12
* One for each table + a couple for system tables
autovacuum_vacuum_scale_factor: 0.1
autovacuum_cost_delay: 5ms

Here are the benchmarksql settings for all 4 runs:

17:07:54,268 [main] INFO jTPCC : Term-00, warehouses=500
17:07:54,269 [main] INFO jTPCC : Term-00, terminals=128
17:07:54,272 [main] INFO jTPCC : Term-00, runTxnsPerTerminal=100000
17:07:54,273 [main] INFO jTPCC : Term-00, limitTxnsPerMin=300000
17:07:54,273 [main] INFO jTPCC : Term-00, terminalWarehouseFixed=false
17:07:54,274 [main] INFO jTPCC : Term-00,
17:07:54,274 [main] INFO jTPCC : Term-00, newOrderWeight=45
17:07:54,274 [main] INFO jTPCC : Term-00, paymentWeight=43
17:07:54,274 [main] INFO jTPCC : Term-00, orderStatusWeight=4
17:07:54,275 [main] INFO jTPCC : Term-00, deliveryWeight=4
17:07:54,275 [main] INFO jTPCC : Term-00, stockLevelWeight=4

For run 0, I started with:

vacuumdb -U postgres -fz;./runBenchmark.sh my_postgres.properties

And then for each subsequent run, I just ran the benchmark without the
vacuum full so that PostgreSQL could prove us wrong. It didn't. Here is
the break down of the results:

RUN START DISK SIZE END DISK SIZE TPS/Terminal
0 54 78 868.6796875
1 78 91 852.4765625
2 91 103 741.4609375
3 103 116 686.125

The good news is, PostgreSQL is not doing half bad against 128
connections with only 16vCPU. The bad news is we more than doubled our
disk size without getting reuse or bloat under control. The concern here
is that under heavy write loads that are persistent, we will eventually
bloat out and have to vacuum full, no matter what. I know that Jan has
done some testing and the best he could get is something like 8 days
before PostgreSQL became unusable (but don't quote me on that).

I am open to comments, suggestions, running multiple tests with
different parameters or just leaving this in the archive for people to
reference.

Thanks in advance,

JD

--
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc

PostgreSQL Centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://pgconf.us
***** Unless otherwise stated, opinions are my own. *****

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2017-07-19 22:24:18 Re: autovacuum can't keep up, bloat just continues to rise
Previous Message Álvaro Hernández Tortosa 2017-07-19 22:00:50 Re: Authentication mechanisms categorization