Re: Question about behavior of snapshot too old feature

From: Kevin Grittner <kgrittn(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Question about behavior of snapshot too old feature
Date: 2016-10-17 13:04:43
Message-ID: CACjxUsPX2LPAVDoUNEnhc4UNvVtuDxEM-vspYJX8=CbDOPjkoQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Oct 16, 2016 at 9:26 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:

> When I set old_snapshot_threshold = 0 I got error at step #3, which
> means that the error is occurred without table pruning.

The "snapshot too old" error can happen without pruning, but only
because there is no way to tell the difference between a page that
has been pruned since the snapshot was taken and a page which has
had some other kind of modification since the snapshot was taken.

Ignoring false positives for a moment (where the page is updated by
something other than pruning), what is required for early pruning
is that the snapshot has expired (which due to "rounding" and
avoidance of locking could easily take up to a minute or two more
than the old_snapshot_threshold setting) and then there is page
pruning due to a vacuum or just HOT pruning from a page read. At
some point after that, a read which is part of returning data to
the user (e.g., not just positioning for index modification) can
see that the snapshot is too old and that the LSN for the page is
past the snapshot LSN. That is when you get the error.

> We have regression test for this feature but it sets
> old_snapshot_threshold = 0, I doubt about we can test it properly.
> Am I missing something?

This is a hard feature to test properly, and certainly hard to test
without the test running for a long time. The zero setting is
really not intended to be used in production, but only to allow
some half-way decent testing that doesn't take extreme lengths of
time. If you add some delays of a few minutes each at key points
in a test, you should be able to get a test that works with a
setting of 1min. It is not impossible that we might need to add a
memory barrier to one or two places to get such tests to behave
consistently, but I have not been able to spot where, if anywhere,
that would be.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2016-10-17 13:48:07 Re: Gather Merge
Previous Message Amit Kapila 2016-10-17 12:36:58 Re: Parallel tuplesort (for parallel B-Tree index creation)