Oh, this is embarrassing: init file logic is still broken

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Oh, this is embarrassing: init file logic is still broken
Date: 2015-06-23 23:44:36
Message-ID: 15290.1435103076@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Chasing a problem identified by my Salesforce colleagues led me to the
conclusion that my commit f3b5565dd ("Use a safer method for determining
whether relcache init file is stale") is rather borked. It causes
pg_trigger_tgrelid_tgname_index to be omitted from the relcache init file,
because that index is not used by any syscache. I had been aware of that
actually, but considered it a minor issue. It's not so minor though,
because RelationCacheInitializePhase3 marks that index as nailed for
performance reasons, and includes it in NUM_CRITICAL_LOCAL_INDEXES.
That means that load_relcache_init_file *always* decides that the init
file is busted and silently(!) ignores it. So we're taking a nontrivial
hit in backend startup speed as of the last set of minor releases.

Clearly, my shortcut of defining the init file contents as exactly the set
of syscache-supporting rels was too short. I think the cleanest fix is
to create a wrapper function in relcache.c that encapsulates the knowledge
that pg_trigger_tgrelid_tgname_index is also to be included in the init
file. But to prevent this type of problem from recurring, we need some
defenses against the wrapper getting out of sync with reality elsewhere.
I suggest that:

1. write_relcache_init_file ought to bitch if it concludes that a nailed
relation is not to be written to the init file. Probably an Assert is
enough.

2. load_relcache_init_file ought to complain if it takes the "wrong number
of nailed relations" exit path. Here, it seems like there might be a
chance of the path being taken validly, for example if a minor release
were to decide to nail a rel that wasn't nailed before, or vice versa.
So what I'm thinking about here is an elog(WARNING) complaint, which would
make logic bugs like this pretty visible in development builds. If we
ever did make a change like that in a minor release, people might get a
one-time WARNING when upgrading, but I think that would be acceptable.

Thoughts, better ideas?

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kouhei Kaigai 2015-06-24 00:42:23 Re: upper planner path-ification
Previous Message Merlin Moncure 2015-06-23 21:50:07 Re: proposal: row_to_array function