I've spent a few hours pouring over the source code with
coarse-toothed comb, trying to figure out just exactly what might
break if we changed MyDatabaseId after backend startup time, or in
other words, allowed a backend to unbind from the database to which it
was originally bound and rebind to a new one. This is related to the
periodic conversations we've had about a built-in connection pooler,
and/or maintaining a pool of worker threads that could be used to
service parallel query, replication sets, etc. What follows is not
meant to be a concrete design proposal; it's basic research that may
lead to a proposal at some time in the future. Still, comments are
For the use cases that seem interesting, it is sufficient to think
about changing MyDatabaseId when there is no transaction in progress,
or perhaps, when there is a transaction in progress which does that
and nothing else. This means that code that simply uses MyDatabaseId
in the course of going about its duties is not going to blow up.
However, anything that stores the current database ID someplace is a
hazard, as is any code that stores information which was obtained
using the current database ID. Looking through the code, it appears
to me that we'd need to do the following (not necessarily in this
1. Blow away all of our catalog caches as if we've received a sinval
reset. Since most of our caches are hooked into the catalog cache via
invalidation hooks, this should trigger cascading invalidations of
other caches that need it, at least in most cases. (Further looking
is probably needed to identify any cases where a
sinval-reset-equivalent is insufficient.)
2. Blow away absolutely everything from the relcache. It appears to
me that we'd need to repeat the phase 2 and 3 initialization steps.
We can't even keep the entries for shared catalogs, because the
descriptors that formrdesc() coughs up may not exactly match what we
read from the pg_class entry. It's tempting to propose adding enough
additional shared catalogs to make the relcache entries for shared
catalogs independent of any particular database's contents, but right
now they are not.
3. Reinitialize the relmapper.
4. Release all locks on objects in the old database; and our shared
lock on the database itself. Take a new shared lock on the new
5. Flush out any backend-local statistics that have been gathered but
not yet sent to the statistics collector.
6. Update our entry in the PgBackendStatus array so pg_stat_activity
sees the new database assignment.
7. Change MyProc->databaseId while holding ProcArrayLock in LW_EXCLUSIVE mode.
8. Update MyDatabaseTableSpace.
I haven't benchmarked this (so perhaps I should STFU) but #2 and maybe
#1 and #3 sounds like the most painful part of this activity. Still,
there's not much help for it that I can see. The best you can hope
for is to keep around entries to the shared catalogs, and that's not
going to get you terribly far, and at least ATM even that is unsafe.
I think maybe the thing to do is try to quantify how much time is
being spent in each part of the backend startup process - like connect
to the database lots and lots of times with individual connections,
run a trivial SELECT against a table, and then disconnect and repeat.
The problem is that (unless someone knows how to do magic tricks with
oprofile or dtrace) you're only going to find out the amount of time
actually spent executing each portion of the code. It won't tell you
how much overhead you have from fork() itself, let alone how much
faster the initialization steps would have run if they had been
running on a warmed-up process address space rather than one that
might well need to fault in a lot of page-table entries in its first
few moments of life. But it might still provide some useful data.
The Enterprise PostgreSQL Company
pgsql-hackers by date
|Next:||From: Tom Lane||Date: 2010-11-15 05:26:48|
|Subject: Re: Refactoring the Type System |
|Previous:||From: Itagaki Takahiro||Date: 2010-11-15 03:56:59|
|Subject: Re: SQL/MED estimated time of arrival?|