Re: profiling connection overhead

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Subject: Re: profiling connection overhead
Date: 2010-11-28 23:23:09
Message-ID: AANLkTinkK-X5mCmW+MJzxGiJ9MO8EEc5GgmYBPHnOUJJ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Nov 28, 2010 at 3:53 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> The more general issue here is what to do about our
>> high backend startup costs.  Beyond trying to recycle backends for new
>> connections, as I've previous proposed and with all the problems it
>> entails, the only thing that looks promising here is to try to somehow
>> cut down on the cost of populating the catcache and relcache, not that
>> I have a very clear idea how to do that.
>
> One comment to make here is that it would be a serious error to focus on
> the costs of just starting and stopping a backend; you have to think
> about cases where the backend does at least some useful work in between,
> and that means actually *populating* those caches (to some extent) not
> just initializing them.  Maybe your wording above was chosen with that
> in mind, but I think onlookers might easily overlook the point.

I did have that in mind, but I agree the point is worth mentioning.
So, for example, it wouldn't gain anything meaningful for us to
postpone catcache initialization until someone executes a query. It
would improve the synthetic benchmark, but that's it.

> FWIW, today I've been looking into getting rid of the silliness in
> build_index_pathkeys whereby it reconstructs pathkey opfamily OIDs
> from sortops instead of just using the index opfamilies directly.
> It turns out that once you fix that, there is no need at all for
> relcache to cache per-index operator data (the rd_operator arrays)
> because that's the only code that uses 'em.  I don't see any particular
> change in the runtime of the regression tests from ripping out that
> part of the cached data, but it ought to have at least some beneficial
> effect on real startup time.

Wow. that's great. The fact that it simplifies the code is probably
the main point, but obviously whatever cycles we can save during
startup (and ongoing operation) are all to the good.

One possible way to get a real speedup here would be to look for ways
to trim the number of catcaches. But I'm not too convinced there's
much water to squeeze out of that rock. After our recent conversation
about KNNGIST, it occurred to me to wonder whether there's really any
point in pretending that a user can usefully add an AM, both due to
hard-wired planner knowledge and due to lack of any sort of extensible
XLOG support. If not, we could potentially turn pg_am into a
hardcoded lookup table rather than a modifiable catalog, which would
also likely be faster; and perhaps reference AMs elsewhere with
characters rather than OIDs. But even if this were judged a sensible
thing to do I'm not very sure that even a purpose-built synthetic
benchmark would be able to measure the speedup.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2010-11-28 23:41:58 Re: profiling connection overhead
Previous Message Jeff Janes 2010-11-28 22:41:25 Re: contrib: auth_delay module