Re: Using bitmap index scans-more efficient

From: Kyle Bateman <kyle(at)actarg(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-sql(at)postgresql(dot)org, Florian Weimer <fweimer(at)bfk(dot)de>
Subject: Re: Using bitmap index scans-more efficient
Date: 2006-08-15 21:58:39
Message-ID: 44E2438F.8000105@actarg.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Tom Lane wrote:

>Kyle Bateman <kyle(at)actarg(dot)com> writes:
>
>
>>But I'm assuming that using an interval-encoded project tree, I would
>>have to do something like the following to get a progency group:
>>
>>
>
>
>
>>select * from ledger l, proj p where p.proj_id = l.proj and p.left >
>>1234 and p.right < 2345;
>>
>>
>
>btree has no idea about the constraint (that I imagine exists) that left
><= right. If you're just doing a simple index on (left, right) then the
>above query requires scanning all index entries with left > 1234. It
>would probably help to say
>
>select * from ledger l, proj p where p.proj_id = l.proj and
> p.left > 1234 and p.left < 2345 and p.right < 2345;
>
>so that you can constrain the range of "left" values scanned.
>
>
Thanks for the replies, Tom and Florian.

My problem is not that it is difficult (or costly) to determine the
progeny of a given project. I can determine this in about 90 msec
regardless of whether I use an adjacency model, interval-encoding, or
materialized path (current implementation). The problem is, when I try
to extract the ledger entries belonging to that progeny from a set of a
million records, it seems to need to process all million records rather
than being able to index right into the ones I want.

I'm not very good at reading explain output, but I tried to set up the
query Tom suggests by creating an interval-encoded project table
(proj_int) and then joining it to my ledger like so:

select l.* from ledger l, proj_int i where l.proj = i.proj_id and i.lft
>= 5283 and i.lft < 5300 and i.rgt <= 5300;

On my mini-test-ledger of 100,000 entries, this takes the longest time
(5 seconds) with the following explain output:

Hash Join (cost=19018.46..23411.52 rows=14 width=85)
Hash Cond: ("outer".proj = "inner".proj_id)
-> Nested Loop Left Join (cost=18994.38..23378.41 rows=1700 width=85)
-> Hash Join (cost=18990.84..23340.87 rows=1700 width=81)
Hash Cond: ("outer".vendid = "inner".org_id)
-> Merge Join (cost=18935.35..23255.64 rows=1700 width=63)
Merge Cond: (("outer".vendid = "inner".vendid) AND
(("outer".invnum)::text = "inner"."?column10?"))
-> Index Scan using apinv_hdr_pkey on apinv_hdr h
(cost=0.00..3148.16 rows=51016 width=21)
-> Sort (cost=18935.35..19235.45 rows=120041
width=55)
Sort Key: i.vendid, (i.invnum)::text
-> Seq Scan on apinv_items i
(cost=0.00..4152.99 rows=120041 width=55)
Filter: ((status = 'en'::bpchar) OR
(status = 'cl'::bpchar) OR (status = 'pd'::bpchar))
-> Hash (cost=50.99..50.99 rows=1799 width=26)
-> Seq Scan on vend_org v (cost=0.00..50.99
rows=1799 width=26)
-> Materialize (cost=3.54..3.55 rows=1 width=4)
-> Seq Scan on acct a (cost=0.00..3.54 rows=1 width=4)
Filter: ((code)::text = 'ap'::text)
-> Hash (cost=24.06..24.06 rows=10 width=4)
-> Bitmap Heap Scan on proj_int i (cost=2.26..24.06 rows=10
width=4)
Recheck Cond: ((lft >= 5283) AND (lft < 5300) AND (rgt <=
5300))
-> Bitmap Index Scan on i_proj_int_lft_rgt
(cost=0.00..2.26 rows=10 width=0)
Index Cond: ((lft >= 5283) AND (lft < 5300) AND
(rgt <= 5300))

That is roughly equivalent to my materialized path method:

select l.* from ledger l where l.proj in (select proj_id from proj_v
where 4737 = any(ppath));

And is quite slow compared to 150 msec when enumerating the progeny
projects like so:

select l.* from ledger l where l.proj in
(4737,4789,4892,4893,4894,4895,4933,4934,4935);

Nested Loop Left Join (cost=19.73..4164.10 rows=7 width=85)
-> Nested Loop (cost=19.73..4139.08 rows=7 width=81)
-> Nested Loop (cost=19.73..4100.07 rows=7 width=63)
-> Bitmap Heap Scan on apinv_items i
(cost=19.73..1185.71 rows=487 width=55)
Recheck Cond: ((proj = 4737) OR (proj = 4789) OR
(proj = 4892) OR (proj = 4893) OR (proj = 4894) OR (proj = 4895) OR
(proj = 4933) OR (proj = 4934) OR (proj = 4935))
Filter: ((status = 'en'::bpchar) OR (status =
'cl'::bpchar) OR (status = 'pd'::bpchar))
-> BitmapOr (cost=19.73..19.73 rows=495 width=0)
-> Bitmap Index Scan on i_apinv_items_proj
(cost=0.00..2.19 rows=55 width=0)
Index Cond: (proj = 4737)
-> Bitmap Index Scan on i_apinv_items_proj
(cost=0.00..2.19 rows=55 width=0)
Index Cond: (proj = 4789)
-> Bitmap Index Scan on i_apinv_items_proj
(cost=0.00..2.19 rows=55 width=0)
Index Cond: (proj = 4892)
-> Bitmap Index Scan on i_apinv_items_proj
(cost=0.00..2.19 rows=55 width=0)
Index Cond: (proj = 4893)
-> Bitmap Index Scan on i_apinv_items_proj
(cost=0.00..2.19 rows=55 width=0)
Index Cond: (proj = 4894)
-> Bitmap Index Scan on i_apinv_items_proj
(cost=0.00..2.19 rows=55 width=0)
Index Cond: (proj = 4895)
-> Bitmap Index Scan on i_apinv_items_proj
(cost=0.00..2.19 rows=55 width=0)
Index Cond: (proj = 4933)
-> Bitmap Index Scan on i_apinv_items_proj
(cost=0.00..2.19 rows=55 width=0)
Index Cond: (proj = 4934)
-> Bitmap Index Scan on i_apinv_items_proj
(cost=0.00..2.19 rows=55 width=0)
Index Cond: (proj = 4935)
-> Index Scan using apinv_hdr_pkey on apinv_hdr h
(cost=0.00..5.97 rows=1 width=21)
Index Cond: (("outer".vendid = h.vendid) AND
(("outer".invnum)::text = (h.invnum)::text))
-> Index Scan using vend_org_pkey on vend_org v
(cost=0.00..5.56 rows=1 width=26)
Index Cond: (v.org_id = "outer".vendid) Nested Loop Left
Join (cost=19.73..4164.10 rows=7 width=85)
-> Nested Loop (cost=19.73..4139.08 rows=7 width=81)
-> Nested Loop (cost=19.73..4100.07 rows=7 width=63)
-> Bitmap Heap Scan on apinv_items i
(cost=19.73..1185.71 rows=487 width=55)
Recheck Cond: ((proj = 4737) OR (proj = 4789) OR
(proj = 4892) OR (proj = 4893) OR (proj = 4894) OR (proj = 4895) OR
(proj = 4933) OR (proj = 4934) OR (proj = 4935))
Filter: ((status = 'en'::bpchar) OR (status =
'cl'::bpchar) OR (status = 'pd'::bpchar))
-> BitmapOr (cost=19.73..19.73 rows=495 width=0)
-> Bitmap Index Scan on i_apinv_items_proj
(cost=0.00..2.19 rows=55 width=0)
Index Cond: (proj = 4737)
-> Bitmap Index Scan on i_apinv_items_proj
(cost=0.00..2.19 rows=55 width=0)
Index Cond: (proj = 4789)
-> Bitmap Index Scan on i_apinv_items_proj
(cost=0.00..2.19 rows=55 width=0)
Index Cond: (proj = 4892)
-> Bitmap Index Scan on i_apinv_items_proj
(cost=0.00..2.19 rows=55 width=0)
Index Cond: (proj = 4893)
-> Bitmap Index Scan on i_apinv_items_proj
(cost=0.00..2.19 rows=55 width=0)
Index Cond: (proj = 4894)
-> Bitmap Index Scan on i_apinv_items_proj
(cost=0.00..2.19 rows=55 width=0)
Index Cond: (proj = 4895)
-> Bitmap Index Scan on i_apinv_items_proj
(cost=0.00..2.19 rows=55 width=0)
Index Cond: (proj = 4933)
-> Bitmap Index Scan on i_apinv_items_proj
(cost=0.00..2.19 rows=55 width=0)
Index Cond: (proj = 4934)
-> Bitmap Index Scan on i_apinv_items_proj
(cost=0.00..2.19 rows=55 width=0)
Index Cond: (proj = 4935)
-> Index Scan using apinv_hdr_pkey on apinv_hdr h
(cost=0.00..5.97 rows=1 width=21)
Index Cond: (("outer".vendid = h.vendid) AND
(("outer".invnum)::text = (h.invnum)::text))
-> Index Scan using vend_org_pkey on vend_org v
(cost=0.00..5.56 rows=1 width=26)
Index Cond: (v.org_id = "outer".vendid)
-> Seq Scan on acct a (cost=0.00..3.54 rows=1 width=4)
Filter: ((code)::text = 'ap'::text)
-> Seq Scan on acct a (cost=0.00..3.54 rows=1 width=4)
Filter: ((code)::text = 'ap'::text)

I'm still stumped...

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Kyle Bateman 2006-08-15 23:15:37 Re: Using bitmap index scans-more efficient
Previous Message Aaron Bono 2006-08-15 19:18:13 Re: The Right Way to manage schemas in SCM systems