Re: bitmask index

From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: bitmask index
Date: 2011-07-05 17:43:26
Message-ID: 4E134D3E.8030809@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 07/05/2011 06:15 AM, Marcus Engene wrote:
> Though partial index solved another problem. Usually I'm a little bit
> annoyed with the optimizer and the developers religious "fix the
> planner instead of index hints". I must say that I'm willing to
> reconsider my usual stance to that.
>
> We have a large table of products where status=20 is a rare
> intermediate status. I added a...
>
> CREATE INDEX pond_item_common_x8 ON pond_item_common(pond_user, status)
> WHERE status = 20;
>
> ...and a slow 5s select with users who had existing status=20 items
> became very fast. Planner, I guess, saw the 10000 status 20 clips (out
> of millions of items) instead of like 5 different values of status and
> thus ignoring the index. Super!
>
> To my great amazement, the planner also managed to use the index when
> counting how many status=20 items there are in total:

I'm glad we got you to make a jump toward common ground with the
database's intended use. There are many neat advanced ways to solve the
sorts of problems people try to hammer with hints available in
PostgreSQL, some of which don't even exist in other databases. It's
kind of interesting to me how similarly one transition tends to happen
to people who learn a lot about those options, enough that they can talk
fully informed about things like how hints would have to work in
PostgreSQL--for example: they'd have to consider all all these partial
index possibilities. Once you go through all that, suddenly a lot of
the people who do it realize that maybe hints aren't as important as
good design and indexing--when you take advantages of all the features
available to you--after all.

To help explain what happened to you here a little better, the planner
tracks Most Common Values in the database, and it uses those statistics
to make good decisions about the ones it finds. But when a value is
really rare, it's never going to make it to that list, and therefore the
planner is going to make a guess about how likely it is--likely a wrong
one. By creating a partial index on that item, it's essentially adding
that information--just how many rows are going to match a query looking
for that value--so that it can be utilized the same way MCVs are.
Adding partial indexes on sparse columns that are critical to a common
report allow what I'm going to coin a new acronym for: those are part
of the Most Important Values in that column. The MIV set is the MCV
information plus information about the rare but critical columns. And
the easiest way to expose that data to the planner is with a partial index.

I smell a blog post coming on this topic.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
Comprehensive and Customized PostgreSQL Training Classes:
http://www.2ndquadrant.us/postgresql-training/

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Matthias Howell 2011-07-05 20:50:22 Query in 9.0.2 not using index in 9.0.0 works fine
Previous Message Greg Smith 2011-07-05 17:41:01 Re: Postgres bulkload without transaction logs