Re: [HACKERS] Re: pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <

From: Kevin Grittner <kgrittn(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Ants Aasma <ants(dot)aasma(at)eesti(dot)ee>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Kevin Grittner <kgrittn(at)postgresql(dot)org>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, "pgsql-committers(at)postgresql(dot)org" <pgsql-committers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Re: pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
Date: 2016-06-15 17:44:34
Message-ID: CACjxUsPUY1j9u+i74G35WWrBscv8S87xx8v0-jE0=ZStRWpzDA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

On Wed, Jun 15, 2016 at 10:46 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Sat, Jun 11, 2016 at 11:29 AM, Kevin Grittner <kgrittn(at)gmail(dot)com> wrote:
>> I have reviewed the code and run tests to try to find something
>> here which could be considered a bug, without finding any problem.
>> When reading pages for the random sample for ANALYZE (or
>> auto-analyze) there is not an age check; so ANALYZE completes
>> without error, keeping statistics up-to-date.
>>
>> There really is no difference in behavior except in the case that:
>>
>> (1) old_snapshot_threshold >= 0 to enable the "snapshot too old"
>> feature, and
>> (2) there were tuples that were dead as the result of completed
>> transactions, and
>> (3) those tuples became older than the threshold, and
>> (4) those tuples were pruned or vacuumed away, and
>> (5) an ANALYZE process would have read those dead tuples had they
>> not been removed.
>>
>> In such a case the irrevocably dead, permanently removed tuples are
>> not counted in the statistics. I have trouble seeing a better
>> outcome than that. Among my tests, I specifically checked for an
>> ANALYZE of a table having an index on an expression, using an old
>> snapshot:
>>
>> -- connection 1
>> drop table if exists t1;
>> create table t1 (c1 int not null);
>> drop table if exists t2;
>> create table t2 (c1 int not null);
>> insert into t1 select generate_series(1, 10000);
>> drop function mysq(i int);
>> create function mysq(i int)
>> returns int
>> language plpgsql
>> immutable
>> as $mysq$
>> begin
>> return (i * i);
>> end
>> $mysq$;
>> create index t1_c1sq on t1 ((mysq(c1)));
>> begin transaction isolation level repeatable read;
>> select 1;
>>
>> -- connection 2
>> vacuum analyze verbose t1;
>> delete from t1 where c1 between 1000 and 1999;
>> delete from t1 where c1 = 8000;
>> insert into t2 values (1);
>> select pg_sleep_for('2min');
>> vacuum verbose t1; -- repeat if necessary to see the dead rows
>> disappear
>>
>> -- connection 1
>> analyze verbose t1;
>>
>> This runs to completion, as I would want and expect.
>>
>> I am closing this item on the "PostgreSQL 9.6 Open Items" page. If
>> anyone feels that I've missed something, please provide a test to
>> show the problem, or a clear description of the problem and how you
>> feel behavior should be different.
>
> So what happens in this scenario:
>
> 1. ANALYZE runs really slowly - maybe the user-defined function it's
> running for the expression index is extremely long-running.
> 2. Eventually, the snapshot for ANALYZE is older than the configured
> value of snapshot_too_old.
> 3. Then, ANALYZE selects a page with an LSN new enough that it might
> have been pruned.
>
> Presumably, the ANALYZE ought to error out in this scenario, just as
> it would in any other situation where an old snapshot sees a new page.
> No?

The test I showed creates a situation which (to ANALYZE) is
identical to what you describe -- ANALYZE sees a page with an LSN
recent enough that it could have been (and actually has been)
pruned. Why would it be better for the ANALYZE to fail than to
complete?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Robert Haas 2016-06-15 18:29:22 Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
Previous Message Robert Haas 2016-06-15 15:46:54 Re: [HACKERS] Re: pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <

Browse pgsql-hackers by date

  From Date Subject
Next Message Julien Rouhaud 2016-06-15 18:13:19 Re: Rename max_parallel_degree?
Previous Message Jimmy 2016-06-15 16:58:43 Re: pg_isready features