Re: [sqlsmith] Crash in GetOldestSnapshot()

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Andreas Seltenreich <seltenreich(at)gmx(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [sqlsmith] Crash in GetOldestSnapshot()
Date: 2016-08-07 18:04:10
Message-ID: CA+TgmoYiBFyDCNxaQCMorNRB7QtrUn-Lz0GnEb9Vcq-R2y=EXA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Aug 7, 2016 at 1:57 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk> writes:
>> In a similar case in the past involving holdable cursors, the solution
>> was to detoast _before_ storing in the tuplestore (see
>> PersistHoldablePortal). I guess the question now is, under what
>> circumstances is it now allowable to detoast a datum with no active
>> snapshot? (Wouldn't it be necessary in such a case to know what the
>> oldest snapshot ever used in the transaction was?)
>
> After looking at this a bit, I think probably the appropriate solution
> is to register the snapshot that was used by the query and store it as
> a property of the Portal, releasing it when the Portal is destroyed.
> Essentially, this views possession of a relevant snapshot as a resource
> that is required to make toast dereferences safe.

Hmm, good idea.

> I think there has been a bug here for awhile. Consider a committed-dead
> row with some associated toast data, and suppose the query's snapshot
> was the last one that could see that row. Once we destroy the query's
> snapshot, there is nothing preventing a concurrent VACUUM from removing
> the dead row and the toast data.

Yeah: as you see, I came to the same conclusion.

> When the RETURNING code was originally
> written, I think this was safe enough, because the bookkeeping that
> determined when VACUUM could remove data was based on transactions'
> advertised xmins, and those did not move once set for the life of the
> transaction. So dereferencing a toast pointer you'd fetched was safe
> for the rest of the transaction. But when we changed over to
> oldest-snapshot-based xmin advertisement, and made it so that a
> transaction holding no snapshots advertised no xmin, we created a hazard
> for data held in portals.

But I missed this aspect of it.

> In this view of things, flattening toast pointers in "held" tuplestores
> is still necessary, but it's because their protective snapshot is going
> away not because the transaction as a whole is going away. But as long
> as we hold onto the relevant snapshot, we don't have to do that for
> portals used intra-transaction.
>
> It's interesting to think about whether we could let snapshots outlive
> transactions and thereby not need to flatten "held" tuplestores either.
> I kinda doubt it's a good idea because of the potential bad effects
> for vacuum not being able to remove dead rows for a long time; but
> it seems at least possible to do it, in this world-view.

EnterpriseDB has had complaints from customers about the cost of
flattening TOAST pointers when tuplestores are held, so I think
providing an option to skip the flattening (at the risk of increased
bloat) would be a good idea even if we chose not to change the default
behavior. It's worth noting that the ability to set
old_snapshot_threshold serves as a way of limiting the damage that can
be caused by the open snapshot, so that optional behavior might be
more appealing now than heretofore.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2016-08-07 18:05:05 Re: [sqlsmith] Crash in GetOldestSnapshot()
Previous Message Tom Lane 2016-08-07 18:03:48 Re: [sqlsmith] Crash in GetOldestSnapshot()