Re: snapshot too old, configured by time

From: Steve Singer <steve(at)ssinger(dot)info>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject: Re: snapshot too old, configured by time
Date: 2015-11-08 23:07:08
Message-ID: BLU437-SMTP193E7367B788D2EC2B2B81DC160@phx.gbl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 10/15/2015 05:47 PM, Kevin Grittner wrote:
> All other issues raised by Álvaro and Steve have been addressed,
> except for this one, which I will argue against:

I've been looking through the updated patch

In snapmgr.c

+ * XXX: If we can trust a read of an int64 value to be atomic, we can
skip the
+ * spinlock here.
+ */
+int64
+GetOldSnapshotThresholdTimestamp(void)

Was your intent with the XXX for it to be a TODO to only aquire the lock
on platforms without the atomic 64bit operations?

On a more general note:

I've tried various manual tests of this feature and it sometimes works
as expected and sometimes doesn't.
I'm getting the feeling that how I expect it to work isn't quite in sync
with how it does work.

I'd expect the following to be sufficient to generate the test

T1: Obtains a snapshot that can see some rows
T2: Waits 60 seconds and performs an update on those rows
T2: Performs a vacuum
T1: Waits 60 seconds, tries to select from the table. The snapshot
should be too old

For example it seems that in test 002 the select_ok on conn1 following
the vacuum but right before the final sleep is critical to the snapshot
too old error showing up (ie if I remove that select_ok but leave in the
sleep I don't get the error)

Is this intended and if so is there a better way we can explain how
things work?

Also is 0 intended to be an acceptable value for old_snapshot_threshold
and if so what should we expect to see then?

>> So if I understand correctly, every call to BufferGetPage needs to have
>> a TestForOldSnapshot immediately afterwards? It seems easy to later
>> introduce places that fail to test for old snapshots. What happens if
>> they do? Does vacuum remove tuples anyway and then the query returns
>> wrong results? That seems pretty dangerous. Maybe the snapshot could
>> be an argument of BufferGetPage?
> There are 486 occurences of BufferGetPage in the source code, and
> this patch follows 36 of them with TestForOldSnapshot. This only
> needs to be done when a page is going to be used for a scan which
> produces user-visible results. That is, it is *not* needed for
> positioning within indexes to add or vacuum away entries, for heap
> inserts or deletions (assuming the row to be deleted has already
> been found). It seems wrong to modify about 450 BufferGetPage
> references to add a NULL parameter; and if we do want to make that
> noop change as insurance, it seems like it should be a separate
> patch, since the substance of this patch would be buried under the
> volume of that.
>
> I will add this to the November CF.
>
> --
> Kevin Grittner
> EDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2015-11-08 23:35:57 Re: eXtensible Transaction Manager API
Previous Message Andres Freund 2015-11-08 23:04:40 Re: PATCH: 9.5 replication origins fix for logical decoding