Re: MVCC catalog access

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: MVCC catalog access
Date: 2013-07-02 13:31:23
Message-ID: CA+TgmoY3ALC3GERRC=-A6mWOh7ZYKUdGN9VhA20B_-2oj3nt2w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jul 2, 2013 at 9:02 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> Some things that might be worth changing when committing:
> * Could you add a Assert(!RelationHasSysCache(relid)) to
> RelationInvalidatesSnapshotsOnly? It's not unlikely that it will be
> missed by the next person adding a syscache and that seems like it
> could have ugly and hard to detect consequences.

There's a cross-check in InitCatalogCache() for that very issue.

> * maybe use bsearch(3) instead of open coding the binary search? We
> already use it in the backend.

I found comments elsewhere indicating that bsearch() was slower than
open-coding it, so I copied the logic used for ScanKeywordLookup().

> * possibly paranoid, but I'd add a Assert to heap_beginscan_catalog or
> GetCatalogSnapshot ensuring we're dealing with a catalog relation. The
> consistency mechanisms in GetCatalogSnapshot() only work for those, so
> there doesn't seem to be a valid usecase for non-catalog relations.

It'll actually work find as things stand; it'll just take a new
snapshot every time.

I have a few ideas for getting rid of the remaining uses of
SnapshotNow that I'd like to throw out there:

- In pgrowlocks and pgstattuple, I think it would be fine to use
SnapshotSelf instead of SnapshotNow. The only difference is that it
includes changes made by the current command that wouldn't otherwise
be visible until CommandCounterIncrement(). That, however, is not
really a problem for their usage.

- In genam.c and execUtils.c, we treat SnapshotNow as a kind of
default snapshot. That seems like a crappy idea. I propose that we
either set that pointer to NULL and let the server core dump if the
snapshot doesn't get set or (maybe better) add a new special snapshot
called SnapshotError which just errors out if you try to use it for
anything, and initialize to that.

- I'm not quite sure what to do about get_actual_variable_range().
Taking a new MVCC snapshot there seems like it might be pricey on some
workloads. However, I wonder if we could use SnapshotDirty.
Presumably, even uncommitted tuples affect the amount of
index-scanning we have to do, so that approach seems to have some
theoretical justification. But I'm worried there could be unfortunate
consequences to looking at uncommitted data, either now or in the
future. SnapshotSelf seems less likely to have that problem, but
feels wrong somehow.

- currtid_byreloid() and currtid_byrelname() use SnapshotNow as an
argument to heap_get_latest_tid(). I don't know what these functions
are supposed to be good for, but taking a new snapshot for every
function call seems to guarantee that someone will find a way to use
these functions as a poster child for how to brutalize PGXACT, so I
don't particularly want to do that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nikhil Sontakke 2013-07-02 13:32:46 Custom gucs visibility
Previous Message Amit Kapila 2013-07-02 13:19:36 Re: Patch for fail-back without fresh backup