Re: Something is rotten in the state of Denmark...

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Something is rotten in the state of Denmark...
Date: 2015-04-02 16:54:02
Message-ID: 14837.1427993642@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Wed, Apr 1, 2015 at 7:05 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> I've not fully tracked it down, but I think that the blame falls on the
>> MVCC-snapshots-for-catalog-scans patch; it appears that it's trying to
>> read pg_am's pg_class entry with a snapshot that's too old, possibly
>> because it assumes that sinval signaling is alive which I think ain't so.

> Hmm, sorry for the bug. It looks to me like sinval signaling gets
> started up at the beginning of InitPostgres(). Perhaps something like
> this?

Yeah, it does, so you'd think this would work. I've now tracked down
exactly what's happening, and it's this: while we're reading pg_authid
during PerformAuthentication, we establish a catalog snapshot. Later on,
we read pg_database to find out what DB we're connecting to. If any
sinval messages for unshared system catalogs arrive at that point, they'll
effectively be ignored, *because our MyDatabaseId is still zero and so
doesn't match the incoming message*. That's okay as far as the catcaches
are concerned, cause they're empty, and the relcache only knows about
shared catalogs so far. But InvalidateCatalogSnapshot doesn't get called
by LocalExecuteInvalidationMessage, and so we think we can plow ahead with
the old snapshot.

It looks to me like an appropriate fix would be as attached; thoughts?
(I've tested this with a delay during relation lock acquisition and
concurrent whole-database VACUUM FULLs, and so far been unable to break
it.)

regards, tom lane

Attachment Content-Type Size
startup-catalog-snapshot-fix.patch text/x-diff 903 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2015-04-02 17:10:04 Re: vac truncation scan problems
Previous Message Alvaro Herrera 2015-04-02 16:42:03 Re: POLA violation with \c service=