Re: Recheck condition

From: "Josh Harrison" <joshques(at)gmail(dot)com>
To: "Alvaro Herrera" <alvherre(at)alvh(dot)no-ip(dot)org>, "Martijn van Oosterhout" <kleptog(at)svana(dot)org>, "Postgresql General" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Recheck condition
Date: 2007-11-30 12:51:31
Message-ID: 8d89ea1d0711300451h219be3b9y67cd7c84e96e0817@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Nov 29, 2007 8:15 AM, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:

> Josh Harrison escribió:
> > >
> > > > For example if I have a table Person with 3 fields
> (name,city_id,age).
> > > And
> > > > the table contains 1000 rows. The table has 2 indexes city_id and
> age
> > > > If I have a query :
> > > > SELECT * FROM PERSON WHERE city_id=5 AND AGE=30
> >
> > Okay....So If I have a query like the above and the query plan shows a
> > 'recheck condition' and bitmap scan, then does that mean it scans the
> > indexes first to get the intermediate results and goto the heap only for
> the
> > final data?
>
> Yes.
>
> If the table actually contains 1000 rows, the most likely outcome is
> that the bitmaps would not be lossy and therefore no rechecking is
> needed at all. (Tuple bitmaps become lossy only if they have to store a
> lot of tuples, in which case they forget the idea of storing each tuple,
> and instead "compress" the representation to storing only the page
> numbers where matching tuples are to be found).
>
> Note however, that even if the bitmaps are not lossy, the visit to the
> heap is still required, because the need to check for visibility.
>
Thanks...
I have 1 more question in the same line...

*Query1*
SELECT person_id FROM person WHERE (column1=1 AND column2='62')
INTERSECT
SELECT person_id FROM person WHERE (column1=1 AND column2='189')

There is an index created as person_idx(column1,column2)

QUERY
PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

SetOp Intersect (cost=1750719.48..1769378.35 rows=373177 width=4) (actual
time=42913.626..47247.650 rows=6352
loops=1)
-> Sort (cost=1750719.48..1760048.92 rows=3731774 width=4) (actual
time=42913.537..45838.472 rows=3756726
loops=1)
Sort Key: "*SELECT*
1".patient_id

*Sort method: external merge Disk:73432kB *

-> Append (cost=17886.42..1209431.67 rows=3731774 width=4)
(actual time=1474.995..32215.493 rows=3756726
loops=1)
-> Subquery Scan "*SELECT* 1"
(cost=17886.42..496377.90rows=381993 width=4) (actual time=
1474.993..4936.240 rows=327498 loops=1)
-> Bitmap Heap Scan on person (cost=
17886.42..492557.97 rows=381993 width=4) (actual
time=1474.990..4735.972rows=327498 loops=1)
Recheck Cond: ((column1 = 1) AND ((column2)::text
= '62'::text))
-> Bitmap Index Scan on person_idx (cost=
0.00..17790.92 rows=381993 width=0) (actual
time=1469.508..1469.508rows=327498 loops=1)
Index Cond: ((column1 = 1) AND
((column2)::text =
'62'::text))
-> Subquery Scan "*SELECT* 2"
(cost=156754.24..713053.77rows=3349781 width=4) (actual time=
4142.577..25518.305 rows=3429228 loops=1)
-> Bitmap Heap Scan on person (cost=
156754.24..679555.96 rows=3349781 width=4) (actual
time=4142.573..23493.596rows=3429228 loops=1)
Recheck Cond: ((column1 = 1) AND ((column2)::text
= '189'::text))
-> Bitmap Index Scan on person_idx (cost=
0.00..155916.80 rows=3349781 width=0) (actual
time=4136.948..4136.948rows=3429228 loops=1)
Index Cond: ((column1 = 1) AND
((column2)::text =
'189'::text))
Total runtime: 47250.501 ms

**
Question:
In this query Intersection is used. How does postgres handle this? The steps
in the above query are
1.find all tuples that match column1=1 AND column2='62'
2. find all tuples that match column1=1 AND column2='189'
3. Find the intersection of the above 2
Does it go to the heap even to get the intermediate results (1 & 2) ?
or
Does it do the first 2 steps using index and go to the heap for the final
data?

Also what does *Sort method: external merge Disk:73432kB *mean? Should I
have to modify this to make this query run faster? Postgres takes 4 times
slower than Oracle to return this query. Is there a way to make this
faster?

Thanks
jo

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Alvaro Herrera 2007-11-30 12:55:42 Re: Recheck condition
Previous Message Wolfgang Keller 2007-11-30 11:48:54 Re: Linux v.s. Mac OS-X Performance