Re: Plan time Improvement - 64bit bitmapset

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Andres Freund <andres(at)anarazel(dot)de>, Gregory Stark <stark(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Plan time Improvement - 64bit bitmapset
Date: 2009-06-04 01:07:38
Message-ID: 603c8f070906031807u73d6c29an53f86a172e3e3312@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jun 3, 2009 at 5:24 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Wed, Jun 3, 2009 at 3:18 PM, Kevin Grittner
>> <Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
>>> I'm finding that even the ones that had a plan time in the range of
>>> 260 ms go down to 15 ms to 85 ms once the statistics are cached.
>
>> I had some performance results back when we were last looking at
>> default_statistics_target that indicated that the time to repeatedly
>> decompress a toasted statistics array contributed significantly to the
>> total planning time, but my suggestion to disable compression for
>> pg_statistic was summarily poo-poohed for reasons that still aren't
>> quite clear to me.
>
> Well, smaller is better.  Kevin's example demonstrates that it's good
> to have the stats sucked into cache.  If they were uncompressed then
> less of them would fit in however much cache space you have, and
> whatever CPU savings you get would be lost to more I/O to read in
> stats entries.

Smaller-but-compressed isn't better if the limiting factor is CPU
rather than I/O, which I suspect is nearly always the case for
pg_statistic. I would be very surprised to find out that anyone has a
database where query planning regularly results in disk access. Sure,
there are cold-cache effects at system startup, and maybe for rarely
used tables whose pg_statistic entries have been paged out to make
room for more heavily used data, but it doesn't make any sense to me
to optimize for those cases at the expense of the common case where
you are constantly executing new and different queries. In that case,
fast access to the statistics tuples means "I don't need to decompress
this", not "I don't need to read this in from disk".

It wouldn't be so bad if memory-resident tuples, once decompressed,
stayed decompressed at least for the duration of the current query
planning cycle. But it seemed to me when I looked at this a while
back that every call to eqjoinsel() and the various other functions
that look at statistics to do their thing appears to decompress the
tuple all over again. That leads to a huge slowdown in query planning
beginning, for the example Greg Stark constructed, right around a
statistics target of 90.

http://archives.postgresql.org/pgsql-hackers/2008-12/msg00386.php

The exact point where you hit the slowdown will presumably vary with
the size of the relevant columns, but it's surely going to affect a
lot more people now that we've made the default target 10x larger.

> Of course, this all depends on total database size vs total RAM,
> but that's how I'd interpret the observation.  PG is still mostly
> optimized for databases bigger than RAM, so this decision still
> makes sense.
>
> (I think you could try marking the columns of pg_statistic as "don't
> compress" if you have a DB you want to optimize for all-in-memory
> behavior.)

If you set allow_system_table_mods...

...Robert

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2009-06-04 01:27:20 Re: 8.4b2 tsearch2 strange error
Previous Message Jeremy Kerr 2009-06-03 23:44:30 Re: [PATCH v2] Add bit operations util header