Re: MVCC catalog access

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: MVCC catalog access
Date: 2013-07-02 13:52:23
Message-ID: 20130702135223.GB19837@alap2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2013-07-02 09:31:23 -0400, Robert Haas wrote:
> On Tue, Jul 2, 2013 at 9:02 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> > Some things that might be worth changing when committing:
> > * Could you add a Assert(!RelationHasSysCache(relid)) to
> > RelationInvalidatesSnapshotsOnly? It's not unlikely that it will be
> > missed by the next person adding a syscache and that seems like it
> > could have ugly and hard to detect consequences.
>
> There's a cross-check in InitCatalogCache() for that very issue.

Great.

> > * maybe use bsearch(3) instead of open coding the binary search? We
> > already use it in the backend.
>
> I found comments elsewhere indicating that bsearch() was slower than
> open-coding it, so I copied the logic used for ScanKeywordLookup().

Hm. Ok.

> > * possibly paranoid, but I'd add a Assert to heap_beginscan_catalog or
> > GetCatalogSnapshot ensuring we're dealing with a catalog relation. The
> > consistency mechanisms in GetCatalogSnapshot() only work for those, so
> > there doesn't seem to be a valid usecase for non-catalog relations.
>
> It'll actually work find as things stand; it'll just take a new
> snapshot every time.

Ok. Doesn't really change my opinion that it's a crappy idea to use it
otherwise ;)

> - In genam.c and execUtils.c, we treat SnapshotNow as a kind of
> default snapshot. That seems like a crappy idea. I propose that we
> either set that pointer to NULL and let the server core dump if the
> snapshot doesn't get set or (maybe better) add a new special snapshot
> called SnapshotError which just errors out if you try to use it for
> anything, and initialize to that.

I vote for SnapshotError.

> - I'm not quite sure what to do about get_actual_variable_range().
> Taking a new MVCC snapshot there seems like it might be pricey on some
> workloads. However, I wonder if we could use SnapshotDirty.
> Presumably, even uncommitted tuples affect the amount of
> index-scanning we have to do, so that approach seems to have some
> theoretical justification. But I'm worried there could be unfortunate
> consequences to looking at uncommitted data, either now or in the
> future. SnapshotSelf seems less likely to have that problem, but
> feels wrong somehow.

I don't like using SnapshotDirty either. Can't we just use the currently
active snapshot? Unless I miss something this always will be called
while we have one since when we plan we've done an explicit
PushActiveSnapshot() and if we need to replan stuff during execution
PortalRunSelect() will have pushed one.

> - currtid_byreloid() and currtid_byrelname() use SnapshotNow as an
> argument to heap_get_latest_tid(). I don't know what these functions
> are supposed to be good for, but taking a new snapshot for every
> function call seems to guarantee that someone will find a way to use
> these functions as a poster child for how to brutalize PGXACT, so I
> don't particularly want to do that.

Heikki mentioned that at some point they were added for the odbc
driver. I am not particularly inclined to worry about taking too many
snapshots here.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2013-07-02 14:17:08 Re: New regression test time
Previous Message Nikhil Sontakke 2013-07-02 13:32:46 Custom gucs visibility