Cutting initdb's runtime (Perl question embedded)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Cutting initdb's runtime (Perl question embedded)
Date: 2017-04-12 14:12:47
Message-ID: 30896.1492006367@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andres mentioned, and I've confirmed locally, that a large chunk of
initdb's runtime goes into regprocin's brute-force lookups of function
OIDs from function names. The recent discussion about cutting TAP test
time prompted me to look into that question again. We had had some
grand plans for getting genbki.pl to perform the name-to-OID conversion
as part of a big rewrite, but since that project is showing few signs
of life, I'm thinking that a more localized performance fix would be
a good thing to look into. There seem to be a couple of plausible
routes to a fix:

1. The best thing would still be to make genbki.pl do the conversion,
and write numeric OIDs into postgres.bki. The core stumbling block
here seems to be that for most catalogs, Catalog.pm and genbki.pl
never really break down a DATA line into fields --- and we certainly
have got to do that, if we're going to replace the values of regproc
fields. The places that do need to do that approximate it like this:

# To construct fmgroids.h and fmgrtab.c, we need to inspect some
# of the individual data fields. Just splitting on whitespace
# won't work, because some quoted fields might contain internal
# whitespace. We handle this by folding them all to a simple
# "xxx". Fortunately, this script doesn't need to look at any
# fields that might need quoting, so this simple hack is
# sufficient.
$row->{bki_values} =~ s/"[^"]*"/"xxx"/g;
@{$row}{(at)attnames} = split /\s+/, $row->{bki_values};

We would need a bullet-proof, non-hack, preferably not too slow way to
split DATA lines into fields properly. I'm one of the world's worst
Perl programmers, but surely there's a way?

2. Alternatively, we could teach bootstrap mode to build a hashtable
mapping proname to OID while it reads pg_proc data from postgres.bki,
and then make regprocin's bootstrap path consult the hashtable instead
of looking directly at the pg_proc file. That I'm quite sure is do-able,
but it seems like it's leaving money on the table compared to doing
the transformation earlier.

Thoughts?

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-04-12 14:24:30 Re: index-only count(*) for indexes supporting bitmap scans
Previous Message Alexander Kuzmenkov 2017-04-12 14:01:39 Re: index-only count(*) for indexes supporting bitmap scans