Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: ilmari(at)ilmari(dot)org (Dagfinn Ilmari =?utf-8?Q?Manns=C3=A5ker?=)
Cc: John Naylor <jcnaylor(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Andreas Karlsson <andreas(at)proxel(dot)se>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)
Date: 2018-05-08 20:48:48
Message-ID: 7408.1525812528@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Resurrecting this old thread ...

I decided it'd be interesting to re-examine where initdb's runtime
is going, seeing that we just got done with a lot of bootstrap data
restructuring. I stuck some timing code into initdb, and got
results like this:

creating directory /home/postgres/testversion/data ... ok
elapsed = 0.256 msec
creating subdirectories ... ok
elapsed = 2.385 msec
selecting default max_connections ... 100
elapsed = 13.528 msec
selecting default shared_buffers ... 128MB
elapsed = 13.699 msec
selecting dynamic shared memory implementation ... posix
elapsed = 0.129 msec
elapsed = 281.335 msec in select_default_timezone
creating configuration files ... ok
elapsed = 1.319 msec
running bootstrap script ... ok
elapsed = 162.143 msec
performing post-bootstrap initialization ... ok
elapsed = 832.569 msec

Sync to disk skipped.

real 0m1.316s
user 0m0.941s
sys 0m0.395s

(I'm using "initdb -N" because the cost of the sync step is so
platform-dependent, and it's not interesting anyway for buildfarm
or make check-world testing. Also, I rearranged the code slightly
so that select_default_timezone could be timed separately from the
rest of the "creating configuration files" step.)

In trying to break down the "post-bootstrap initialization" step
a bit further, I soon realized that trying to time the sub-steps from
initdb is useless. initdb is just shoving bytes down the pipe as
fast as the kernel will let it; it has no idea how long it's taking
the backend to do any one query or queries. So I ended up adding
"-c log_statement=all -c log_min_duration_statement=0" to the
backend_options, and digging query durations out of the log output.
I got these totals for the major steps in the post-boot run:

pg_authid setup: 0.909 ms
pg_depend setup: 64.980 ms
system views: 106.221 ms
pg_description: 39.665 ms
pg_collation: 65.162 ms
conversions: 72.024 ms
text search: 29.454 ms
init-acl hacking: 14.339 ms
information schema: 188.497 ms
plpgsql: 2.531 ms
analyze/vacuum/additional db creation: 171.762 ms

So the conversions don't look nearly as interesting as Andreas
suggested upthread. Pushing them into .bki format would at best
save ~ 70 ms out of 1300. Which is not nothing, but it's not
going to change the world either.

Really the only thing here that jumps out as being unduly expensive for
what it's doing is select_default_timezone. That is, and always has been,
a brute-force algorithm; I wonder if there's a way to do better? We can
probably guess that every non-Windows platform is using the IANA timezone
data these days. If there were some way to extract the name of the active
timezone setting directly, we wouldn't have to try to reverse-engineer it.
But I don't know of any portable way :-(

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stas Kelvich 2018-05-08 20:51:01 Re: Global snapshots
Previous Message Peter Eisentraut 2018-05-08 20:45:01 Re: perlcritic script