Quick Links

Re: new correlation metric

From:	Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>
To:	Jeff Davis <pgsql(at)j-davis(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org, npboley(at)gmail(dot)com
Subject:	Re: new correlation metric
Date:	2008-10-27 07:27:44
Message-ID:	49056D70.7030802@cheapcomplexdevices.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Jeff Davis wrote:
> Currently, we use correlation to estimate the I/O costs of an index
> scan. However, this has some problems:

It certainly helps some cases.

Without the patch, the little test script below ends up picking the
third fastest plan (a seq-scan) instead of a faster bitmapscan, or
an even faster-than-that indexscan for the query below.
With the patch, it finds the fastest index scan.

Without Patch Estimated_cost Actual_Time
Index Scan 39638.36 331ms
Bitmap Scan 22218.43 415ms
Seq Scan 20125.83 595ms

With Patch Estimated_cost Actual_Time
Index Scan 17684.18 333ms
Bitmap Scan 22110.60 400ms
Seq Scan 20117.51 573ms

I was somewhat surprised that the bitmap cost estimates didn't
also change much. Wouldn't the estimated # of data blocks
read for the bitmap be roughly the same as for the index?

And yes, I know that table's a contrived one that is almost
ideal for this patch - but I have some large clustered-by-zip
address tables where I can find queries that show similar results.

Back in 8.0 I cared a lot since I had a number of real-world
queries picking Seq-Scans instead of Index-Scans. With 8.3,
though, AFAICT the vast majority of my similar real-world queries
pick the bitmap scans which in practice are pretty close in speed
to the index scans.

======================================================================
-- [1] Test script variation from this 2005 thread:
-- http://archives.postgresql.org/pgsql-hackers/2005-02/msg00298.php

create temporary table tmp1mil as
select * from
(select generate_series as a from generate_series(0,9)) as a,
(select generate_series as b from generate_series(0,9)) as b,
(select generate_series as c from generate_series(0,9)) as c,
(select generate_series as d from generate_series(0,9)) as d,
(select generate_series as e from generate_series(0,9)) as e,
(select generate_series as f from generate_series(0,9)) as f
order by a,b,c,d,e,f;
create index tmp1mil__c on tmp1mil(c);
vacuum analyze tmp1mil;
select * from pg_stats where tablename='tmp1mil';

\timing
explain select count(*) from tmp1mil where c<5;
select count(*) from tmp1mil where c<5;
select count(*) from tmp1mil where c<5; -- 615 ms seqscan

set enable_seqscan = false;
explain select count(*) from tmp1mil where c<5;
select count(*) from tmp1mil where c<5;
select count(*) from tmp1mil where c<5; -- 425 ms bitmapscan

set enable_bitmapscan to false;
explain select count(*) from tmp1mil where c<5;
select count(*) from tmp1mil where c<5;
select count(*) from tmp1mil where c<5; -- 342 ms indexscan

In response to

new correlation metric at 2008-10-26 08:38:02 from Jeff Davis

Responses

Re: new correlation metric at 2008-10-27 12:18:07 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	ITAGAKI Takahiro	2008-10-27 08:00:39	Re: contrib/pg_stat_statements
Previous Message	Martijn van Oosterhout	2008-10-27 07:25:10	Re: WIP patch: convert SQL-language functions to return tuplestores