Quick Links

Re: snapshot too old, configured by time

From:	Kevin Grittner <kgrittn(at)gmail(dot)com>
To:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Steve Singer <steve(at)ssinger(dot)info>, Kevin Grittner <kgrittn(at)ymail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: snapshot too old, configured by time
Date:	2016-04-07 22:33:06
Message-ID:	CACjxUsPRELHMFBDp4hCaSQtGkRKDjhDMLt-jknyhvtAvg4BroA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Sun, Apr 3, 2016 at 4:09 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> On Wed, Mar 30, 2016 at 12:34 PM, Kevin Grittner <kgrittn(at)gmail(dot)com> wrote:
>> On Sat, Mar 19, 2016 at 1:27 AM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:

>>> I set the value to 1min.
>>>
>>> I set up a test like this:
>>>
>>> pgbench -i
>>>
>>> pgbench -c4 -j4 -T 3600 &
>>>
>>> ### watch the size of branches table
>>> while (true) ; do psql -c "\dt+" | fgrep _branches; sleep 10; done &
>>>
>>> ### set up a long lived snapshot.
>>> psql -c 'begin; set transaction isolation level repeatable read;
>>> select sum(bbalance) from pgbench_branches; select pg_sleep(300);
>>> select sum(bbalance) from pgbench_branches;'
>>>
>>> As this runs, I can see the size of the pgbench_branches bloating once
>>> the snapshot is taken, and continues bloating at a linear rate for the
>>> full 300 seconds.

I'm not seeing that on my i7 box.

>>> Once the 300 second pg_sleep is up, the long-lived snapshot holder
>>> receives an error once it tries to access the table again, and then
>>> the bloat stops increasing. But shouldn't the bloat have stopped
>>> increasing as soon as the snapshot became doomed, which would be after
>>> a minute or so?

It will, limited by how well your autovacuum can keep up with the
workload on your system. See attached graph. I ran 5 times each
in 3 configurations and graphed the median and average of be each.

master: development checkout from today, no config changes

patch: patch with no config changes except:
old_snapshot_threshold = '1min'

patch + av: patch with these config changes:
old_snapshot_threshold = '1min'
autovacuum_max_workers = 8
autovacuum_vacuum_cost_limit = 2000
autovacuum_naptime = '10s'
autovacuum_work_mem = '1GB'

As expected, differences are minimal at first, then the patch
starts to win, and wins even better with more aggressive
autovacuum.

> I can verify that a manual vacuum does stop the bloat from continuing
> to increase. But I don't see why autovacuum is not already stopping
> the bloat. It is running often enough that it really ought to do so
> (as verified by setting log_autovacuum_min_duration = 0 and looking in
> the log files to see that it is vacuuming the table once per nap-time,
> although it is not accomplishing much by doing so as no tuples can be
> removed.)

Perhaps the CPUs I have or the way I have my machine tuned allows
autovacuum to be more effective in the face of the pgbench load
than on yours?

> Also, HOT-cleanup should stop the bloat increase once the snapshot
> crosses the old_snapshot_threshold without even needing to wait until
> the next autovac runs.

It should help some, but you really need a vacuum in there to take
care things thoroughly.

> Does the code intentionally only work for manual vacuums? If so, that
> seems quite surprising. Or perhaps I am missing something else here.

Perhaps it is that VACUUM tries harder to get the work done, while
autovacuum steps out of the way when it detects that it is blocking
something. This is a pretty small table (it never gets to 1MB even
when bloating) with multiple clients pounding on it. It might just
be that your system doesn't allow much autovacuum activity before
it find itself blocking a pgbench process.

FWIW, our customer's 30-day test runs were on databases of hundreds
of GB and showed similar benefits -- linear growth indefinitely
without the patch, settling in to a pretty steady state after a few
hours with the patch.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment	Content-Type	Size
	image/png	47.9 KB

In response to

Re: snapshot too old, configured by time at 2016-04-03 21:09:11 from Jeff Janes

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Alvaro Herrera	2016-04-07 22:55:17	Re: [patch] Proposal for \crosstabview in psql
Previous Message	Peter Geoghegan	2016-04-07 22:23:23	Re: Using quicksort for every external sort run