Re: Question about behavior of snapshot too old feature

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Kevin Grittner <kgrittn(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Question about behavior of snapshot too old feature
Date: 2016-10-21 10:06:28
Message-ID: CAD21AoD2r6HRg5Go4NGeo3mcERRbW2b+_bL35dzqC5k05VLp8A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Oct 17, 2016 at 10:04 PM, Kevin Grittner <kgrittn(at)gmail(dot)com> wrote:
> On Sun, Oct 16, 2016 at 9:26 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
>> When I set old_snapshot_threshold = 0 I got error at step #3, which
>> means that the error is occurred without table pruning.
>
> The "snapshot too old" error can happen without pruning, but only
> because there is no way to tell the difference between a page that
> has been pruned since the snapshot was taken and a page which has
> had some other kind of modification since the snapshot was taken.
>
> Ignoring false positives for a moment (where the page is updated by
> something other than pruning), what is required for early pruning
> is that the snapshot has expired (which due to "rounding" and
> avoidance of locking could easily take up to a minute or two more
> than the old_snapshot_threshold setting) and then there is page
> pruning due to a vacuum or just HOT pruning from a page read. At
> some point after that, a read which is part of returning data to
> the user (e.g., not just positioning for index modification) can
> see that the snapshot is too old and that the LSN for the page is
> past the snapshot LSN. That is when you get the error.
>> We have regression test for this feature but it sets
>> old_snapshot_threshold = 0, I doubt about we can test it properly.
>> Am I missing something?
>
> This is a hard feature to test properly, and certainly hard to test
> without the test running for a long time. The zero setting is
> really not intended to be used in production, but only to allow
> some half-way decent testing that doesn't take extreme lengths of
> time. If you add some delays of a few minutes each at key points
> in a test, you should be able to get a test that works with a
> setting of 1min. It is not impossible that we might need to add a
> memory barrier to one or two places to get such tests to behave
> consistently, but I have not been able to spot where, if anywhere,
> that would be.

Thank you for explanation! I understood.
When old_snapshot_threshold = 0, it skips to allocate shared memory
area for the xid array and skips the some logic in order to avoid
using the shared memory, so I was concerned about that a little.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Constantin S. Pan 2016-10-21 10:08:58 Re: Fun fact about autovacuum and orphan temp tables
Previous Message Pantelis Theodosiou 2016-10-21 08:54:51 Re: Indirect indexes