initdb and share/postgresql.conf.sample

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: initdb and share/postgresql.conf.sample
Date: 2012-12-23 23:11:21
Message-ID: CAMkU=1yuZDgA8iyJCGPSeXhs7VyUGeX0EJktJ28FPxGN-dsWoA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

In some of my git branches I have
editorialized src/backend/utils/misc/postgresql.conf.sample to contain my
configuration preferences for whatever it is that that branch is for
testing. Then this gets copied to share/postgresql.conf.sample during
install and from there to data/postgresql.conf during initdb, and I don't
need to remember to go make the necessary changes.

Am I insane to be doing this? Is there a better way to handle this
branch-specific configuration needs?

Anyway, I was recently astonished to discovery that the contents
of share/postgresql.conf.sample during the initdb affected the performance
of the server, even when the conf file was replaced with something else
before the server was started up. To make a very long story short,
if share/postgresql.conf.sample is set up for archiving, then somewhere in
the initdb process some bootstrap process pre-creates a bunch of extra xlog
files.

Is this alarming? It looks like initdb takes some pains, at least on one
place, to make an empty config file rather than using
postgresql.conf.sample, but it seems like a sub-process is not doing that.

Those extra log files then give the newly started server a boost (whether
it is started in archive mode or not) because it doesn't have to create
them itself. It isn't so much a boost, as the absence of a new-server
penalty. I want to remove that penalty to get better numbers from
benchmarking. What I am doing now is this, between the initdb and the
pg_ctl start:

for g in `perl -e 'printf("0000000100000000000000%02X\n",$_) foreach
2..120'`; do cp /tmp/data/pg_xlog/000000010000000000000001
/tmp/data/pg_xlog/$g -i < /dev/null;

The "120" comes from 2 * checkpoint_segments. That's mighty ugly, is there
a better trick?

You could say that benchmarks should run long enough to average out such
changes, but needing to run a benchmark that long can make some kinds of
work (like git bisect) unrealistic rather than merely tedious.

Cheers,

Jeff

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2012-12-23 23:48:49 Re: WIP: store additional info in GIN index
Previous Message Robert Haas 2012-12-23 22:09:14 Re: Event Triggers: adding information