Re: Allow "snapshot too old" error, to prevent bloat

From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Allow "snapshot too old" error, to prevent bloat
Date: 2015-02-16 05:19:11
Message-ID: 1183720670.3967956.1424063951728.JavaMail.yahoo@mail.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com> writes:
>> On 2/15/15 10:36 AM, Tom Lane wrote:
>>> I wonder if we couldn't achieve largely the same positive effects through
>>> adding a simple transaction-level timeout option.

We suggested this to our customer and got out of the meeting with
it looking like it *might* fly. In the next meeting, however, they
said they had run it by others and reviewed the code and it was
completely unacceptable -- they would not consider pg with this as
the solution.

>> A common use-case is long-running reports hitting relatively stable data
>> in a database that also has tables with a high churn rate (ie: queue
>> tables). In those scenarios your only options right now are to suffer
>> huge amounts of bloat in the high-churn or not do your reporting. A
>> simple transaction timeout only "solves" this by denying you reporting
>> queries.
>
> Agreed, but Kevin's proposal has exactly the same problem only worse,
> because (a) the reporting transactions might or might not fail (per
> Murphy, they'll fail consistently only when you're on deadline), and
> (b) if they do fail, there's nothing you can do short of increasing the
> slack db-wide.

These they were comfortable with, and did *not* consider to be
unpredictable or something they could not do something about.
I really don't feel I can say more than that, though, without
disclosing more than I should.

>> An idea that I've had on this would be some way to "lock down" the
>> tables that a long-running transaction could access. That would allow
>> vacuum to ignore any snapshots that transaction had for tables it wasn't
>> accessing. That's something that would be deterministic.

While this option was not specifically suggested, based on their
their reasons that numerous other options we suggested were
unacceptable, I feel sure that this would not be acceptable.

I think Tom hit the nail on the head when he said "Maybe refugees
from Oracle will think that sounds good..." It is precisely those
with very large code bases which have been modified over long
periods of time to work with Oracle that would find this solution
attractive, or perhaps *necessary* to consider a move to Postgres.
That's a potential market we don't care to write off.

> There might be something in that, but again it's not much like this patch.
> The key point I think we're both making is that nondeterministic failures
> are bad, especially when you're talking about long-running, expensive-to-
> retry transactions.

What the customer most doesn't want to be "nondeterministic" is
whether the error is generated only when the snapshot has been used
to read a page which has been modified since the snapshot was
taken. If tables or materialized views are set up before a query
and then not subsequently modified during the execution of the
query, that query must not be canceled even if it runs for days,
but it must not cause unconstrained bloat during the run. So far I
don't see any way to avoid including the LSN with the snapshot or
modifying the index AMs. Let's be clear on the footprint for that;
for the btree implementation it is:

src/backend/access/nbtree/nbtinsert.c | 7 ++++---
src/backend/access/nbtree/nbtpage.c | 2 +-
src/backend/access/nbtree/nbtsearch.c | 43 ++++++++++++++++++++++++++++++++++---------
src/include/access/nbtree.h | 7 ++++---
4 files changed, 43 insertions(+), 16 deletions(-)

What this discussion has made me reconsider is the metric for
considering a transaction "too old". The number of transaction IDs
consumed seems inferior as the unit of measure for that to LSN or
time.

It looks to me to be pretty trivial (on the order of maybe 30 lines
of code) to specify this GUC in minutes rather than transaction
IDs. At first glance this seems like it would be vulnerable to the
usual complaints about mismanaged clocks, but those are easily
answered by using a cached time in shared memory that we populate
in such a way that it can never move backward. Done right, this
could even allow the GUC to be changeable on reload rather than
only at restart. A badly mismanaged system clock could not cause a
query to generate incorrect results; the worst that could happen is
that this feature would fail to control bloat as much as expected
or reads of modified data could generate the "snapshot too old"
error around the time of the clock adjustment.

As before, this would default to a magic value to mean that you
want the historical PostgreSQL behavior.

If that makes the idea more palatable to anyone, I can submit a
patch to that effect within the next 24 hours.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2015-02-16 07:45:35 pg_dump gets attributes from tables in extensions
Previous Message Shigeru Hanada 2015-02-16 04:53:47 Re: Join push-down support for foreign tables