Quick Links

Re: Index Scan Costs versus Sort

From:	Charlie Savage <cfis(at)interserv(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: Index Scan Costs versus Sort
Date:	2005-11-10 19:00:51
Message-ID:	437398E3.2020900@interserv.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Hi Tom,

From pg_stats:

schema = "tiger";
tablename = "completechain";
attname = "tlid";
null_frac = 0;
avg_width = 4;
n_distinct = -1;
most_common_vals = ;
most_common_freqs = ;
correlation = 0.155914;

Note that I have default_statistics_target set to 100. Here is the
first few values from histogram_bounds:

"{102450,2202250,4571797,6365754,8444936,10541593,12485818,14545727,16745594,18421868,20300549,22498643,24114709,26301001,28280632,30370123,32253657,33943046,35898115,37499478,39469054,41868498,43992143,45907830,47826340,49843926,52051798,54409298,56447416,

The tlid column is a US Census bureau ID assigned to each chain in the
US - where a chain is a road segment, river segment, railroad segment,
etc. The data is loaded on state-by-state basis, and then a
county-by-county basis. There is no overall ordering to TLIDs, although
perhaps there is some local ordering at the county level (but from a
quick look at the data I don't see any, and the correlation factor
indicates there isn't any if I am interpreting it correctly).

Any other info that would be helpful to see?

Charlie

Tom Lane wrote:
> Charlie Savage <cfis(at)interserv(dot)com> writes:
>> 1. Postgresql estimates the index scan will be 50 times more costly
>> than the seq scan (112870376 vs 2229858) yet in fact it only takes 3
>> times longer to execute (2312426 s vs. 768403 s). My understanding is
>> that postgresql assumes, via the random_page_cost parameter, that an
>> index scan will take 4 times longer than a sequential scan. So why is
>> the analyzer estimating it is 50 times slower?
>
> The other factors that are likely to affect this are index correlation
> and effective cache size. It's fairly remarkable that a full-table
> index scan only takes 3 times longer than a seqscan; you must have both
> a high correlation and a reasonably large cache. You showed us your
> effective_cache_size setting, but what's the pg_stats entry for
> completechain.tlid contain? Can you quantify what the physical
> ordering of tlid values is likely to be?
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>

In response to

Re: Index Scan Costs versus Sort at 2005-11-10 18:04:41 from Tom Lane

Responses

Re: Index Scan Costs versus Sort at 2005-11-11 00:11:48 from Charlie Savage

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Kelly Burkhart	2005-11-10 20:46:48	Re: 8.x index insert performance
Previous Message	Ashok Agrawal	2005-11-10 18:17:33	Re: Outer Join performance in PostgreSQL