Re: snapshot too old, configured by time

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Kevin Grittner <kgrittn(at)gmail(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Steve Singer <steve(at)ssinger(dot)info>, Kevin Grittner <kgrittn(at)ymail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: snapshot too old, configured by time
Date: 2016-04-03 21:09:11
Message-ID: CAMkU=1xOYSuYSwB2QBq94Apx2GJL7LhEjCK0qVJg6MDQcOmsPw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 30, 2016 at 12:34 PM, Kevin Grittner <kgrittn(at)gmail(dot)com> wrote:
> On Sat, Mar 19, 2016 at 1:27 AM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>
>> I'm not sure if this is operating as expected.
>>
>> I set the value to 1min.
>>
>> I set up a test like this:
>>
>> pgbench -i
>>
>> pgbench -c4 -j4 -T 3600 &
>>
>> ### watch the size of branches table
>> while (true) ; do psql -c "\dt+" | fgrep _branches; sleep 10; done &
>>
>> ### set up a long lived snapshot.
>> psql -c 'begin; set transaction isolation level repeatable read;
>> select sum(bbalance) from pgbench_branches; select pg_sleep(300);
>> select sum(bbalance) from pgbench_branches;'
>>
>> As this runs, I can see the size of the pgbench_branches bloating once
>> the snapshot is taken, and continues bloating at a linear rate for the
>> full 300 seconds.
>>
>> Once the 300 second pg_sleep is up, the long-lived snapshot holder
>> receives an error once it tries to access the table again, and then
>> the bloat stops increasing. But shouldn't the bloat have stopped
>> increasing as soon as the snapshot became doomed, which would be after
>> a minute or so?
>
> This is actually operating as intended, not a bug. Try running a
> manual VACUUM command about two minutes after the snapshot is taken
> and you should get a handle on what's going on. The old tuples
> become eligible for vacuuming after one minute, but that doesn't
> necessarily mean that autovacuum jumps in and that the space starts
> getting reused.

I can verify that a manual vacuum does stop the bloat from continuing
to increase. But I don't see why autovacuum is not already stopping
the bloat. It is running often enough that it really ought to do so
(as verified by setting log_autovacuum_min_duration = 0 and looking in
the log files to see that it is vacuuming the table once per nap-time,
although it is not accomplishing much by doing so as no tuples can be
removed.)

Also, HOT-cleanup should stop the bloat increase once the snapshot
crosses the old_snapshot_threshold without even needing to wait until
the next autovac runs.

Does the code intentionally only work for manual vacuums? If so, that
seems quite surprising. Or perhaps I am missing something else here.

Thanks,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2016-04-03 21:09:28 Re: PATCH: use foreign keys to improve join estimates v1
Previous Message Peter Geoghegan 2016-04-03 21:06:34 Re: Using quicksort for every external sort run