Quick Links

Re: Bitmap Heap Scan anomaly

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Jeff Davis <pgsql(at)j-davis(dot)com>
Cc:	jaba the mobzy <makaronaforna(at)yahoo(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Bitmap Heap Scan anomaly
Date:	2007-05-04 03:42:32
Message-ID:	9867.1178250152@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Jeff Davis <pgsql(at)j-davis(dot)com> writes:
> On Thu, 2007-05-03 at 14:33 -0700, jaba the mobzy wrote:
>> mycorr_100 took 11.4 s to run although it had to fetch 100000 row from
>> the base table.
>> mycorr_10 took 24.4 s to run although it had to fetch 10563 row from
>> the base table.

> This is because the physical distribution of data is different. The
> mycorr_10 table has tuples in which a and b are > 15.9M spread all
> throughout. mycorr_100 has them all collected together at the end of the
> physical file. Less disk seeking.

If the OP had generated the data randomly, as claimed, the rows
shouldn't be particularly more clumped in one table than the other.
But I sure agree that it sounds like a nonrandom distribution in the
mycorr_100 table. FWIW I tried to duplicate the behavior, and could
not, using tables made up like this:

create table src as
select int4(16*1024*1024*random()) as key,
int4(16*1024*1024*random()) as a,
int4(16*1024*1024*random()) as b
from generate_series(1,16*1024*1024);

create table mycorr_10 as
select key, a,
case when random() < 0.1 then a else b end as b
from src;

create table mycorr_100 as
select key, a, a as b
from src;

create index mycorr_10i on mycorr_10(a,b);

create index mycorr_100i on mycorr_100(a,b);

vacuum analyze mycorr_10;

vacuum analyze mycorr_100;

regards, tom lane

In response to

Re: Bitmap Heap Scan anomaly at 2007-05-03 23:50:42 from Jeff Davis

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Mark Kirkwood	2007-05-04 06:55:19	Re: Updated bitmap index patch
Previous Message	Tom Lane	2007-05-04 01:13:45	pgsql: Teach tuplesort.c about "top N" sorting, in which only the first