Quick Links

Re: autovacuum can't keep up, bloat just continues to rise

From:	"Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To:	Sokolov Yura <y(dot)sokolov(at)postgrespro(dot)ru>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Geoghegan <pg(at)bowt(dot)ie>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, pgsql-hackers-owner(at)postgresql(dot)org
Subject:	Re: autovacuum can't keep up, bloat just continues to rise
Date:	2017-07-23 19:03:09
Message-ID:	0ec4c002-a219-942d-a686-0804c826c3e0@commandprompt.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hello,

I changed the test to run for 6 hours at a time regardless of number of
transactions. I also changed the du command to only look at the database
(previously wal logs were included). This is the clearest indication of
the problem I have been able to produce.

Again, this is with 128 clients and 500 warehouses. The first test is a
clean test, everything dropped, vacuumed etc... Each subsequent test is
just starting the test again to have breakpoints.

Starting base metric
50G /srv/main/base

Test 1:
90G /srv/main/base
TPS: 838

Test 2:
121G /srv/main/base
TPS: 725

Test 3:
146G /srv/main/base
TPS: 642

Test 4:
171G /srv/main/base
TPS: 549

Test 5:
189G /srv/main/base
TPS: 489

Test 6:
208G /srv/main/base
TPS: 454

As you can see even with aggressive vacuuming, over a period of 36 hours
life gets increasingly miserable.

The largest table is:

postgres=# select
pg_size_pretty(pg_total_relation_size('bmsql_order_line'));
pg_size_pretty
----------------
148 GB
(1 row)

With the PK being

postgres=# select pg_size_pretty(pg_relation_size('bmsql_order_line_pkey'));
pg_size_pretty
----------------
48 GB
(1 row)

I tried to see how much data we are dealing with here:

postgres=# select count(*) from bmsql_order_line;
count
-----------
910324839
(1 row)

Time: 503965.767 ms

And just to show that we were pushing to get these numbers:

avg-cpu: %user %nice %system %iowait %steal %idle
2.38 0.00 2.20 1.98 0.00 93.44

Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn
sdb 2027.40 239.99 0.05 1199 0
sda 0.80 0.00 0.01 0 0

So we have 910M rows, and it took 8.39941667 minutes to count them at
240MB/s.

I know this is a lot of data and as I said previously, happy to let
anyone look at it. However, we clearly have something deeper to look into.

Thanks in advance,

--
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc

PostgreSQL Centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://pgconf.us
***** Unless otherwise stated, opinions are my own. *****

In response to

Re: autovacuum can't keep up, bloat just continues to rise at 2017-07-21 06:54:15 from Sokolov Yura

Responses

Re: autovacuum can't keep up, bloat just continues to rise at 2017-07-24 17:50:40 from Joshua D. Drake

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Thomas Munro	2017-07-23 23:51:51	Re: Buildfarm failure and dubious coding in predicate.c
Previous Message	Tom Lane	2017-07-23 17:07:30	Re: Improve perfomance for index search ANY(ARRAY[]) condition with single item