WIP: getting rid of the pg_database flat file

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: WIP: getting rid of the pg_database flat file
Date: 2009-08-11 23:11:20
Message-ID: 2993.1250032280@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

In the discussion of bug #4919 I wrote:
> In some sense this is a bootstrap problem: what does it take to get to
> the point of being able to read pg_database and its indexes? That is
> necessarily not dependent on the particular database we want to join.
> Maybe we could solve it by having the relcache write a "global" cache
> file containing only entries for the global tables, and load that before
> we have identified the database we want to join (after which, we'll load
> another cache file for the local entries). It would doubtless take some
> rearrangement of the backend startup sequence, but it doesn't seem
> obviously impossible.

Attached is a proof-of-concept patch which shows that this idea makes it
possible to start backends without the pg_database flat file, and that the
required search of pg_database can be done with an index as long as we
have the shared relcache cache file available (which should always be true
except for the first backend start after postmaster bootup or crash
recovery). There are a few loose ends yet to fix, but on the whole it
was easier than I expected. The main costs of doing it this way are:

* pg_database has to become a nailed-in-cache relation, as does its
index on datname. (Its index on OID will have to be nailed too, unless
we can get rid of the kluge that lets autovacuum give InitPostgres a
database OID instead of database name. I have not looked at autovacuum
yet.) This doesn't really cost anything except a few more bytes in the
relcache ... and in reality I suspect pg_database is always in that
cache anyway.

* We have to have a Schema_pg_database macro in pg_attribute.h. This
means a little more hand maintenance (unless we accept Robert Haas'
patch to autogenerate all that stuff); but it's still not a big problem.

I think this is clearly worth cleaning up and committing, since even
without any further progress it eliminates number-of-databases as a
significant factor in backend startup time. Does anyone have any
objection to the above side-effects?

To actually get rid of the pg_database flat file, we'd need to take the
further step of teaching the AV launcher to read pg_database for itself,
or else refactor things so that the AV workers can do that for it.
(Alvaro, any comments about the best way to proceed there?)

I'd also like to look into getting rid of the pg_auth flat file.
As previously noted, that means postponing client auth to later in the
startup sequence. If we were willing to eliminate role membership as an
available piece of information for auth method selection, we could still
do much of the auth work before initializing the backend proper; in
particular we could issue a password challenge and wait for a response,
which would be good in terms of reducing our exposure to lightweight DDOS
attacks. I'm not sure if anyone will think that's a good tradeoff though,
since any attacker who can connect to the postmaster port can probably
DDOS the postmaster just fine anyway.

Comments?

regards, tom lane

Attachment Content-Type Size
look-ma-no-flat-file-1.patch.gz application/octet-stream 9.9 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2009-08-11 23:12:50 Re: Re: pgindent timing (was Re: [COMMITTERS] pgsql: Refactor NUM_cache_remove calls in error report path to a PG_TRY)
Previous Message Albert Cervera i Areny 2009-08-11 23:08:54 Re: Alpha 1 release notes