Re: More thoughts about planner's cost estimates

From: Greg Stark <gsstark(at)mit(dot)edu>
To: Greg Stark <gsstark(at)MIT(dot)EDU>
Cc: Hannu Krosing <hannu(at)skype(dot)net>, Greg Stark <gsstark(at)mit(dot)edu>, josh(at)agliodbs(dot)com, David Fetter <david(at)fetter(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: More thoughts about planner's cost estimates
Date: 2006-06-04 06:38:25
Message-ID: 87y7wdbf32.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Greg Stark <gsstark(at)MIT(dot)EDU> writes:

> Perhaps what this indicates is that the real meat is in track sampling, not
> block sampling.

Fwiw, I've done a little benchmarking and I'm starting to think this isn't a
bad idea. I see a dramatic speed improvement for samples of 1-10% as the block
size increases. Presumably this is as Hannu said, reducing the number of
tracks necessary to cover the sample.

I see improvements up to around 256M blocks or so, but my data is pretty
questionable since I'm busy watching tv in Mythtv in another window. It's on
another drive but it still seems to be making the numbers jump around a bit.

I expect there's a trade-off between keeping enough blocks for the sample of
blocks to be representative on the one hand and large blocks being much faster
to read in on the other.

I would suggest something like setting the block size in the block sampling
algorithm to something like max(8k,sqrt(table size)). That gives 8k blocks for
anything up to 255M but takes better advantage of the speed increase available
from sequential i/o for larger tables, from my experiments about a 50%
increase in speed.

Actually maybe even something even more aggressive would be better, maybe
(table size)^.75 So it kicks in sooner than on 256M tables and gets to larger
block sizes on reasonable sized tables.

Note, this doesn't mean anything like changing page sizes, just selecting more
blocks that hopefully lie on the same track when possible.

--
greg

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Stark 2006-06-04 07:09:30 Re: More thoughts about planner's cost estimates
Previous Message Sergey E. Koposov 2006-06-04 04:57:12 Re: Re [HACKERS]: Still not happy with psql's multiline history