Re: Oh, this is embarrassing: init file logic is still broken

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Oh, this is embarrassing: init file logic is still broken
Date: 2015-06-24 21:52:48
Message-ID: 558B26B0.1070704@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 06/23/2015 04:44 PM, Tom Lane wrote:
> Chasing a problem identified by my Salesforce colleagues led me to the
> conclusion that my commit f3b5565dd ("Use a safer method for determining
> whether relcache init file is stale") is rather borked. It causes
> pg_trigger_tgrelid_tgname_index to be omitted from the relcache init file,
> because that index is not used by any syscache. I had been aware of that
> actually, but considered it a minor issue. It's not so minor though,
> because RelationCacheInitializePhase3 marks that index as nailed for
> performance reasons, and includes it in NUM_CRITICAL_LOCAL_INDEXES.
> That means that load_relcache_init_file *always* decides that the init
> file is busted and silently(!) ignores it. So we're taking a nontrivial
> hit in backend startup speed as of the last set of minor releases.

OK, this is pretty bad in its real performance effects. On a workload
which is dominated by new connection creation, we've lost about 17%
throughput.

To test it, I ran pgbench -s 100 -j 2 -c 6 -r -C -S -T 1200 against a
database which fits in shared_buffers on two different m3.large
instances on AWS (across the network, not on unix sockets). A typical
run on 9.3.6 looks like this:

scaling factor: 100
query mode: simple
number of clients: 6
number of threads: 2
duration: 1200 s
number of transactions actually processed: 252322
tps = 210.267219 (including connections establishing)
tps = 31958.233736 (excluding connections establishing)
statement latencies in milliseconds:
0.002515 \set naccounts 100000 * :scale
0.000963 \setrandom aid 1 :naccounts
19.042859 SELECT abalance FROM pgbench_accounts WHERE aid
= :aid;

Whereas a typical run on 9.3.9 looks like this:

scaling factor: 100
query mode: simple
number of clients: 6
number of threads: 2
duration: 1200 s
number of transactions actually processed: 208180
tps = 173.482259 (including connections establishing)
tps = 31092.866153 (excluding connections establishing)
statement latencies in milliseconds:
0.002518 \set naccounts 100000 * :scale
0.000988 \setrandom aid 1 :naccounts
23.076961 SELECT abalance FROM pgbench_accounts WHERE aid
= :aid;

Numbers are pretty consistent on four runs each on two different
instances (+/- 4%), so I don't think this is Amazon variability we're
seeing. I think the syscache invalidation is really costing us 17%. :-(

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2015-06-24 21:53:18 Are we sufficiently clear that jsonb containment is nested?
Previous Message Robert Haas 2015-06-24 21:20:31 Re: Should we back-patch SSL renegotiation fixes?