Re: snapshot too old, configured by time

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Kevin Grittner <kgrittn(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Steve Singer <steve(at)ssinger(dot)info>, Kevin Grittner <kgrittn(at)ymail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: snapshot too old, configured by time
Date: 2016-03-30 19:22:23
Message-ID: 20160330192223.GA988171@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Kevin Grittner wrote:
> On Wed, Mar 30, 2016 at 11:37 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> writes:
> >> I think a safer proposition would be to replace all current
> >> BufferGetPage() calls (there are about 500) by adding the necessary
> >> arguments: buffer, snapshot, rel, and an integer "flags". All this
> >> without adding the feature. Then a subsequent commit would add the
> >> TestForOldSnapshot inside BufferGetPage, *except* when a
> >> BUFFER_NO_SNAPSHOT_TEST flag is passed. That way, new code always get
> >> the snapshot test by default.
> >
> > That seems awfully invasive,
>
> That's the argument I made, which Álvaro described as "dismissing"
> his suggestion. In this post from October of 2015, I pointed out
> that there are 36 calls where we need a snapshot and 450 where we
> don't.
>
> http://www.postgresql.org/message-id/56479263.1140984.1444945639606.JavaMail.yahoo@mail.yahoo.com

I understand the invasiveness argument, but to me the danger of
introducing new bugs trumps that. The problem is not the current code,
but future patches: it is just too easy to make the mistake of not
checking the snapshot in new additions of BufferGetPage. And you said
that the result of missing a check is silent wrong results from queries
that should instead be cancelled, which seems fairly bad to me. My
impression was that you were actually considering doing something about
that -- sorry for the lack of clarity.

We have made similarly invasive changes in the past -- the
SearchSysCache API for instance.

> > not to mention performance-killing if the expectation is that
> > most such calls are going to need a snapshot check.
>
> This patch is one which has allowed a customer where we could not
> meet their performance requirements to pass them. It is the
> opposite of performance-killing.

I think Tom misunderstood what I said and you misunderstood what Tom
said. Let me attempt to set things straight.

I said that we should change BufferGetPage into having the snapshot
check built-in, except in the cases where a flag is passed; and the flag
would be passed in all cases except those 30-something you identified.
In other words, the behavior in all the current callsites would be
identical to what's there today; we could have a macro do the first
check so that we don't introduce the overhead of a function call in the
450 cases where it's not needed.

Tom said that my proposal would be performance-killing, not that your
patch would be performance-killing. But as I argue above, with my
proposal performance would stay the same, so we're actually okay.

I don't think nobody disputes that your patch is good in general.
I would be happy with it in 9.6, but I have my reservations about the
aforementioned problem.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2016-03-30 19:29:40 Re: snapshot too old, configured by time
Previous Message Peter Geoghegan 2016-03-30 19:21:36 Re: snapshot too old, configured by time