From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Greg Stark <stark(at)mit(dot)edu> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: COUNT(*) and index-only scans |
Date: | 2011-10-12 14:29:39 |
Message-ID: | 13161.1318429779@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Greg Stark <stark(at)mit(dot)edu> writes:
> Assuming you're in a steady-state situation the amount of all-visible
> blocks will fluctuate from a high just after vacuum to a low just
> before the next vacuum. There are other ways a block can be marked
> all-visible but for the most part I would expect the fraction to go
> steadily down until vacuum comes along and cleans things up.
> So if vacuum tracked the fraction of blocks marked all-visible
> *before* it processed them and the fraction it marked all-visible
> after processing we have an upper and lower bound. If we knew how long
> it's been since vacuum we could interpolate between those, or we could
> just take the mean, or we could take the lower bound as a conservative
> estimate.
I thought of that too, but we don't do the comparable thing for dead
tuple counts, and I am not convinced that we should do it for
visibility. I'd rather have a simple rule that "it's right immediately
after VACUUM", so that at least trivial cases like read-only tables
work correctly.
>> What I suggest as a first cut for that is: simply derate the visibility fraction as the fraction
>> of the table expected to be scanned gets smaller.
> I think there's a statistically more rigorous way of accomplishing the
> same thing. If you treat the pages we estimate we're going to read as
> a random sample of the population of pages then your expected value is
> the fraction of the overall population that is all-visible but your
> 95th percentile confidence interval will be, uh, a simple formula we
> can compute but I don't recall off-hand.
The problem is precisely that the pages a query is going to read are
likely to *not* be a random sample, but to be correlated with
recently-dirtied pages.
> ... It currently uses all expected values but in many
> cases it would be valuable if the planner knew what the standard
> deviation of those estimates was.
No doubt, but I'm not volunteering to fix that before we can have a
non-toy estimate for index-only scans.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Andrew Dunstan | 2011-10-12 14:31:26 | Re: [BUGS] *.sql contrib files contain unresolvable MODULE_PATHNAME |
Previous Message | Robert Haas | 2011-10-12 14:18:16 | Re: COUNT(*) and index-only scans |