Re: Question about behavior of snapshot too old feature

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Kevin Grittner <kgrittn(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Question about behavior of snapshot too old feature
Date: 2016-10-17 02:26:46
Message-ID: CAD21AoCrQDwUjYk5gSLhH-Xw_nVLDqe8daEgxJnoEC_vMdzarQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Oct 14, 2016 at 11:29 PM, Kevin Grittner <kgrittn(at)gmail(dot)com> wrote:
> On Fri, Oct 14, 2016 at 8:53 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>> On Fri, Oct 14, 2016 at 1:40 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
>>> For example, I set old_snapshot_threshold = 1min and prepare a table
>>> and two terminals.
>>> And I did the followings steps.
>>>
>>> 1. [Terminal 1] Begin transaction and get snapshot data and wait.
>>> BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
>>> SELECT * FROM test;
>>>
>>> 2. [Terminal 2] Another session updates test table in order to make
>>> snapshot dirty.
>>> BEGIN;
>>> UPDATE test SET c = c + 100;
>>> COMMIT;
>>>
>>> 3. [Terminal 1] 1 minute after, read the test table again in same
>>> transaction opened at #1. I got no error.
>>> SELECT * FROM test;
>>>
>>> 4. [Terminal 2] Another session reads the test table.
>>> BEGIN;
>>> SELECT * FROM test;
>>> COMMIT;
>>>
>>> 5. [Terminal 1] 1 minute after, read the test table again, and got
>>> "snapshot error" error.
>>> SELECT * FROM test;
>>>
>>> Since #2 makes a snapshot I got at #1 dirty, I expected to get
>>> "snapshot too old" error at #3 where I read test table again after
>>> enough time. But I could never get "snapshot too old" error at #3.
>>>
>>
>> Here, the basic idea is that till the time corresponding page is not
>> pruned or table vacuuming hasn't triggered, this error won't occur.
>> So, I think what is happening here that during step #4 or step #3, it
>> has pruned the table, after which you started getting error.
>
> The pruning might be one factor. Another possible issue is that
> effectively it doesn't start timing that 1 minute until the clock
> hits the start of the next minute (i.e., 0 seconds after the next
> minute). The old_snapshot_threshold does not attempt to guarantee
> that the snapshot too old error will happen at the earliest
> opportunity, but that the error will *not* happen until the
> snapshot is *at least* that old. Keep in mind that the expected
> useful values for this parameter are from a small number of hours
> to a day or two, depending on the workload. The emphasis was on
> minimizing overhead, even when it meant the cleanup might not be
> quite as "eager" as it could otherwise be.
>

Thanks! I understood.
I've tested with autovacuum = off, so it has pruned the table at step #4.

When I set old_snapshot_threshold = 0 I got error at step #3, which
means that the error is occurred without table pruning.
We have regression test for this feature but it sets
old_snapshot_threshold = 0, I doubt about we can test it properly.
Am I missing something?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kouhei Kaigai 2016-10-17 02:39:14 Re: Steps inside ExecEndGather
Previous Message Amit Kapila 2016-10-17 02:22:16 Re: Steps inside ExecEndGather