From: | Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com> |
Cc: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Allow "snapshot too old" error, to prevent bloat |
Date: | 2015-02-15 18:59:05 |
Message-ID: | 54E0EC79.3060009@BlueTreble.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2/15/15 10:36 AM, Tom Lane wrote:
> Kevin Grittner <kgrittn(at)ymail(dot)com> writes:
>> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Kevin Grittner <kgrittn(at)ymail(dot)com> writes:
>>>> What this patch does is add a GUC call old_snapshot_threshold. It
>>>> defaults to -1, which leaves behavior matching unpatched code.
>>>> Above that it allows tuples to be vacuumed away after the number of
>>>> transaction IDs specified by the GUC have been consumed.
>
>>> TBH, I'm not sure why we'd wish to emulate Oracle's single worst
>>> operational feature.
>
>> I've run into cases where people have suffered horribly bloated
>> databases because of one ill-behaved connection. Some companies
>> don't want to be vulnerable to that and the disruption that
>> recovery from that bloat causes.
>
> No doubt, preventing bloat is a good thing, but that doesn't mean this
> is the best API we could create for the issue. The proposition this
> patch offers to DBAs is: "You can turn this knob to reduce bloat by some
> hard-to-quantify factor. The cost is that some long-running transactions
> might fail. You won't know which ones are at risk, the failures won't be
> the same from time to time, and you won't be able to do anything to spare
> high-value transactions from that fate except by turning that knob back
> again globally (which requires a database restart)." Maybe refugees from
> Oracle will think that sounds good, but nobody else will.
>
> I wonder if we couldn't achieve largely the same positive effects through
> adding a simple transaction-level timeout option. That would be far
> easier for users to understand and manage, it would be trivial to allow
> specific high-value transactions to run with a laxer setting, it does not
> create any implementation-detail-dependent behavior that we'd be having to
> explain to users forevermore, and (not incidentally) it would be a lot
> simpler and more trustworthy to implement. There's no well-defined
> correlation between your setting and the net effect on database bloat,
> but that's true with the "snapshot too old" approach as well.
A common use-case is long-running reports hitting relatively stable data
in a database that also has tables with a high churn rate (ie: queue
tables). In those scenarios your only options right now are to suffer
huge amounts of bloat in the high-churn or not do your reporting. A
simple transaction timeout only "solves" this by denying you reporting
queries.
An idea that I've had on this would be some way to "lock down" the
tables that a long-running transaction could access. That would allow
vacuum to ignore any snapshots that transaction had for tables it wasn't
accessing. That's something that would be deterministic.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2015-02-15 19:21:55 | Re: New CF app deployment |
Previous Message | Tomas Vondra | 2015-02-15 18:57:40 | EXPERIMENTAL: mmap-based memory context / allocator |