Quick Links

Re: Performance hit on large row counts

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	David Scott <davids(at)apptechsys(dot)com>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Performance hit on large row counts
Date:	2005-12-26 21:36:30
Message-ID:	12356.1135632990@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

David Scott <davids(at)apptechsys(dot)com> writes:
> We are trying to ascertain if we are up against the limits of what
> postgres can accomplish without having the tables clustered. ...

> We are aware that there is a minimum time that is required to resolve
> the index values against the table to ascertain that they are live rows,
> and we believe we are circumventing that time to some extent by taking
> advantage of the rows being in physical order with the cluster. So does
> this lead us to the conclusion that the differences in the query times
> is how long is takes us to check on disk whether or not these rows are live?

Both of your initial examples are bitmap scans, which should be pretty
insensitive to index correlation effects --- certainly the planner
assumes so. What I'd want to know about is why the planner is picking
different indexes for the queries. The CLUSTER may be affecting things
in some other way, like by squeezing out dead tuples causing a
reduction in the total table and index sizes.

The join examples use plain indexscans, which *would* be affected by
correlation ... but again, why are you getting a different scan plan
for "stuff" than in the non-join case?

It's not helping you that the rowcount estimates are so far off.
I think the different plans might be explained by the noise in the
rowcount estimates.

You should try increasing the statistics targets on the columns you use
in the WHERE conditions.

I'm not at all sure I believe your premise that querying for a different
key value excludes cache effects, btw. On modern hardware it's likely
that CLUSTER would leave the *whole* of these tables sitting in kernel
disk cache.

regards, tom lane

In response to

Performance hit on large row counts at 2005-12-26 21:03:34 from David Scott

Responses

Re: Performance hit on large row counts at 2005-12-26 23:07:23 from David Scott

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Tom Lane	2005-12-26 21:57:11	Re: Bitmap indexes etc.
Previous Message	Ivan Voras	2005-12-26 21:32:13	Bitmap indexes etc.