Re: bitmask index

From: Marcus Engene <mengpg2(at)engene(dot)se>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: bitmask index
Date: 2011-07-05 10:15:30
Message-ID: 4E12E442.9020202@engene.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 6/22/11 11:42 , Greg Smith wrote:
> On 06/22/2011 05:27 PM, Marcus Engene wrote:
>> I have some tables with bitmask integers. Set bits are the
>> interesting ones. Usually they are sparse.
>
> If it's sparse, create a partial index that just includes rows where
> the bit is set:
> http://www.postgresql.org/docs/current/static/indexes-partial.html
>
> You need to be careful the query uses the exact syntax as the one that
> created the index for it to be used. But if you do that, it should be
> able to pull the rows that match out quickly.
>
I ended up having a separate table with an index on.

Though partial index solved another problem. Usually I'm a little bit
annoyed with the optimizer and the developers religious "fix the planner
instead of index hints". I must say that I'm willing to reconsider my
usual stance to that.

We have a large table of products where status=20 is a rare intermediate
status. I added a...

CREATE INDEX pond_item_common_x8 ON pond_item_common(pond_user, status)
WHERE status = 20;

...and a slow 5s select with users who had existing status=20 items
became very fast. Planner, I guess, saw the 10000 status 20 clips (out
of millions of items) instead of like 5 different values of status and
thus ignoring the index. Super!

To my great amazement, the planner also managed to use the index when
counting how many status=20 items there are in total:

pond90=> explain analyze select
pond90-> coalesce(sum(tt.antal),0) as nbr_in_queue
pond90-> from
pond90-> (
pond90(> select
pond90(> pu.username
pond90(> ,t.antal
pond90(> from
pond90(> (
pond90(> select
pond90(> sum(1) as antal
pond90(> ,pond_user
pond90(> from
pond90(> pond_item_common
pond90(> where
pond90(> status = 20
pond90(> group by pond_user
pond90(> ) as t
pond90(> ,pond_user pu
pond90(> where
pond90(> pu.objectid = t.pond_user
pond90(> order by t.antal desc
pond90(> ) as tt;
QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=38079.45..38079.46 rows=1 width=8) (actual
time=166.439..166.440 rows=1 loops=1)
-> Sort (cost=38079.13..38079.18 rows=21 width=18) (actual
time=166.009..166.085 rows=648 loops=1)
Sort Key: (sum(1))
Sort Method: quicksort Memory: 67kB
-> Nested Loop (cost=37903.66..38078.67 rows=21 width=18)
(actual time=157.545..165.561 rows=648 loops=1)
-> HashAggregate (cost=37903.66..37903.92 rows=21
width=4) (actual time=157.493..157.720 rows=648 loops=1)
-> Bitmap Heap Scan on pond_item_common
(cost=451.43..37853.37 rows=10057 width=4) (actual time=9.061..151.511
rows=12352 loops=1)
Recheck Cond: (status = 20)
-> Bitmap Index Scan on
pond_item_common_x8 (cost=0.00..448.91 rows=10057 width=0) (actual
time=5.654..5.654 rows=20051 loops=1)
Index Cond: (status = 20)
-> Index Scan using pond_user_pkey on pond_user pu
(cost=0.00..8.30 rows=1 width=14) (actual time=0.011..0.012 rows=1
loops=648)
Index Cond: (pu.objectid = pond_item_common.pond_user)
Total runtime: 166.709 ms
(13 rows)

My hat's off to the dev gang. Impressive!

Best,
Marcus

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Shaun Thomas 2011-07-05 13:35:38 Re: Infinite Cache
Previous Message Greg Smith 2011-07-03 18:18:16 Re: Infinite Cache