Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <

From: Kevin Grittner <kgrittn(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Ants Aasma <ants(dot)aasma(at)eesti(dot)ee>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Kevin Grittner <kgrittn(at)postgresql(dot)org>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, "pgsql-committers(at)postgresql(dot)org" <pgsql-committers(at)postgresql(dot)org>
Subject: Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
Date: 2016-06-11 15:29:54
Message-ID: CACjxUsMAFcCC8gWttOfHF=-1xC98fXehThwykvsJT9Jh2=GULg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

On Tue, May 24, 2016 at 4:10 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Tue, May 24, 2016 at 3:48 PM, Kevin Grittner <kgrittn(at)gmail(dot)com> wrote:
>> On Tue, May 24, 2016 at 12:00 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>>> On 2016-05-24 11:24:44 -0500, Kevin Grittner wrote:
>>>> On Fri, May 6, 2016 at 8:28 PM, Kevin Grittner <kgrittn(at)gmail(dot)com> wrote:
>>>>> On Fri, May 6, 2016 at 7:48 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>>>>
>>>>>> That comment reminds me of a question I had: Did you consider the effect
>>>>>> of this patch on analyze? It uses a snapshot, and by memory you've not
>>>>>> built in a defense against analyze being cancelled.
>>
>> The primary defense is not considering a cancellation except in 30
>> of the 500 places a page is used. None of those 30 are, as far as
>> I can see (upon review in response to your question), used in the
>> analyze process.
>
> It's not obvious to me how this is supposed to work. If ANALYZE's
> snapshot is subject to being ignored for xmin purposes because of
> snapshot_too_old, then I would think it would need to consider
> cancelling itself if it reads a page with possibly-removed data, just
> like in any other case. It seems that we might not actually need a
> snapshot set for ANALYZE in all cases, because the comments say:
>
> /* functions in indexes may want a snapshot set */
> PushActiveSnapshot(GetTransactionSnapshot());
>
> If we can get away with it, it would be a rather large win to only set
> a snapshot when the table has an expression index. For purposes of
> "snapshot too old", though, it will be important that a function in an
> index which tries to read data from some other table which has been
> pruned cancels itself when necessary.

I have reviewed the code and run tests to try to find something
here which could be considered a bug, without finding any problem.
When reading pages for the random sample for ANALYZE (or
auto-analyze) there is not an age check; so ANALYZE completes
without error, keeping statistics up-to-date.

There really is no difference in behavior except in the case that:

(1) old_snapshot_threshold >= 0 to enable the "snapshot too old"
feature, and
(2) there were tuples that were dead as the result of completed
transactions, and
(3) those tuples became older than the threshold, and
(4) those tuples were pruned or vacuumed away, and
(5) an ANALYZE process would have read those dead tuples had they
not been removed.

In such a case the irrevocably dead, permanently removed tuples are
not counted in the statistics. I have trouble seeing a better
outcome than that. Among my tests, I specifically checked for an
ANALYZE of a table having an index on an expression, using an old
snapshot:

-- connection 1
drop table if exists t1;
create table t1 (c1 int not null);
drop table if exists t2;
create table t2 (c1 int not null);
insert into t1 select generate_series(1, 10000);
drop function mysq(i int);
create function mysq(i int)
returns int
language plpgsql
immutable
as $mysq$
begin
return (i * i);
end
$mysq$;
create index t1_c1sq on t1 ((mysq(c1)));
begin transaction isolation level repeatable read;
select 1;

-- connection 2
vacuum analyze verbose t1;
delete from t1 where c1 between 1000 and 1999;
delete from t1 where c1 = 8000;
insert into t2 values (1);
select pg_sleep_for('2min');
vacuum verbose t1; -- repeat if necessary to see the dead rows
disappear

-- connection 1
analyze verbose t1;

This runs to completion, as I would want and expect.

I am closing this item on the "PostgreSQL 9.6 Open Items" page. If
anyone feels that I've missed something, please provide a test to
show the problem, or a clear description of the problem and how you
feel behavior should be different.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Peter Eisentraut 2016-06-11 23:28:21 pgsql: PL/Python: Rename new keyword arguments of plpy.error() etc.
Previous Message Amit Kapila 2016-06-11 11:27:44 Re: [COMMITTERS] pgsql: Don't generate parallel paths for rels with parallel-restricted

Browse pgsql-hackers by date

  From Date Subject
Next Message David Steele 2016-06-11 15:46:45 Re: Confusing recovery message when target not hit
Previous Message Michael Paquier 2016-06-11 12:22:27 Re: Confusing recovery message when target not hit