Re: Improving N-Distinct estimation by ANALYZE

From: Greg Stark <gsstark(at)mit(dot)edu>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Greg Stark <gsstark(at)MIT(dot)EDU>, "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>, josh(at)agliodbs(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Improving N-Distinct estimation by ANALYZE
Date: 2006-01-10 22:00:02
Message-ID: 87hd8bzqu5.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Simon Riggs <simon(at)2ndquadrant(dot)com> writes:

> On Mon, 2006-01-09 at 22:08 -0500, Greg Stark wrote:
>
> > So it's not the 8k block reading that's fooling Linux into reading ahead 32k.
> > It seems 32k readahead is the default for Linux, or perhaps it's the
> > sequential access pattern that's triggering it.
>
> Nah, Linux 2.6 uses flexible readahead logic. It increases slowly when
> you read sequentially, but halves the readahead if you do another access
> type. Can't see that would give an average readahead size of 32k.

I've actually read this code at one point in the past. IIRC the readahead is
capped at 32k, which I find interesting given the results. Since this is
testing sequential access patterns perhaps what's happening is the test for
readahead is too liberal.

All the numbers I'm getting are consistent with a 32k readahead. Even I run my
program with a 4k block size I get performance equivalent to a full table scan
very quickly. If I use a 32k block size then the breakeven point is just over
20%.

I suppose what I really ought to do is make some pretty graphs.

--
greg

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message tmorelli 2006-01-10 22:52:52 A question about index internals
Previous Message Josh Berkus 2006-01-10 21:14:31 Re: Question about Postgresql time fields(possible bug)