Re: Allow "snapshot too old" error, to prevent bloat

From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Allow "snapshot too old" error, to prevent bloat
Date: 2015-02-15 18:59:05
Message-ID: 54E0EC79.3060009@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2/15/15 10:36 AM, Tom Lane wrote:
> Kevin Grittner <kgrittn(at)ymail(dot)com> writes:
>> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Kevin Grittner <kgrittn(at)ymail(dot)com> writes:
>>>> What this patch does is add a GUC call old_snapshot_threshold. It
>>>> defaults to -1, which leaves behavior matching unpatched code.
>>>> Above that it allows tuples to be vacuumed away after the number of
>>>> transaction IDs specified by the GUC have been consumed.
>
>>> TBH, I'm not sure why we'd wish to emulate Oracle's single worst
>>> operational feature.
>
>> I've run into cases where people have suffered horribly bloated
>> databases because of one ill-behaved connection. Some companies
>> don't want to be vulnerable to that and the disruption that
>> recovery from that bloat causes.
>
> No doubt, preventing bloat is a good thing, but that doesn't mean this
> is the best API we could create for the issue. The proposition this
> patch offers to DBAs is: "You can turn this knob to reduce bloat by some
> hard-to-quantify factor. The cost is that some long-running transactions
> might fail. You won't know which ones are at risk, the failures won't be
> the same from time to time, and you won't be able to do anything to spare
> high-value transactions from that fate except by turning that knob back
> again globally (which requires a database restart)." Maybe refugees from
> Oracle will think that sounds good, but nobody else will.
>
> I wonder if we couldn't achieve largely the same positive effects through
> adding a simple transaction-level timeout option. That would be far
> easier for users to understand and manage, it would be trivial to allow
> specific high-value transactions to run with a laxer setting, it does not
> create any implementation-detail-dependent behavior that we'd be having to
> explain to users forevermore, and (not incidentally) it would be a lot
> simpler and more trustworthy to implement. There's no well-defined
> correlation between your setting and the net effect on database bloat,
> but that's true with the "snapshot too old" approach as well.

A common use-case is long-running reports hitting relatively stable data
in a database that also has tables with a high churn rate (ie: queue
tables). In those scenarios your only options right now are to suffer
huge amounts of bloat in the high-churn or not do your reporting. A
simple transaction timeout only "solves" this by denying you reporting
queries.

An idea that I've had on this would be some way to "lock down" the
tables that a long-running transaction could access. That would allow
vacuum to ignore any snapshots that transaction had for tables it wasn't
accessing. That's something that would be deterministic.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2015-02-15 19:21:55 Re: New CF app deployment
Previous Message Tomas Vondra 2015-02-15 18:57:40 EXPERIMENTAL: mmap-based memory context / allocator