Re: Cutting initdb's runtime (Perl question embedded)

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Cutting initdb's runtime (Perl question embedded)
Date: 2017-04-12 17:34:37
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2017-04-12 10:12:47 -0400, Tom Lane wrote:
> Andres mentioned, and I've confirmed locally, that a large chunk of
> initdb's runtime goes into regprocin's brute-force lookups of function
> OIDs from function names. The recent discussion about cutting TAP test
> time prompted me to look into that question again. We had had some
> grand plans for getting to perform the name-to-OID conversion
> as part of a big rewrite, but since that project is showing few signs
> of life, I'm thinking that a more localized performance fix would be
> a good thing to look into. There seem to be a couple of plausible
> routes to a fix:
> 1. The best thing would still be to make do the conversion,
> and write numeric OIDs into postgres.bki. The core stumbling block
> here seems to be that for most catalogs, and
> never really break down a DATA line into fields --- and we certainly
> have got to do that, if we're going to replace the values of regproc
> fields. The places that do need to do that approximate it like this:
> # To construct fmgroids.h and fmgrtab.c, we need to inspect some
> # of the individual data fields. Just splitting on whitespace
> # won't work, because some quoted fields might contain internal
> # whitespace. We handle this by folding them all to a simple
> # "xxx". Fortunately, this script doesn't need to look at any
> # fields that might need quoting, so this simple hack is
> # sufficient.
> $row->{bki_values} =~ s/"[^"]*"/"xxx"/g;
> @{$row}{(at)attnames} = split /\s+/, $row->{bki_values};
> We would need a bullet-proof, non-hack, preferably not too slow way to
> split DATA lines into fields properly. I'm one of the world's worst
> Perl programmers, but surely there's a way?

I've done something like 1) before:

I don't think the speeds matters all that much, because we'll only do it
when generating the .bki file - a couple ms more or less won't matter

I IIRC spent some more time to also load the data files from a different
although that's presumably heavily outdated now.

- Andres

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2017-04-12 17:34:38 Re: Letting the client choose the protocol to use during a SASL exchange
Previous Message Alexander Kuzmenkov 2017-04-12 17:23:22 Re: index-only count(*) for indexes supporting bitmap scans