Re: ANALYZE sampling is too good

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ANALYZE sampling is too good
Date: 2013-12-09 18:03:50
Message-ID: 52A60606.9030409@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greg,

> I really don't believe the 5% thing. It's not enough for n_distinct
> and it's *far* too high a value for linear properties like histograms
> or nullfrac etc.

Actually, it is enough for n_distinct, or more properly, 5% is as good
as you can get for n_distinct unless you're going to jump to scanning
50% or more.

It's also applicable for the other stats; histogram buckets constructed
from a 5% sample are more likely to be accurate than those constructed
from a 0.1% sample. Same with nullfrac. The degree of improved
accuracy, would, of course, require some math to determine.

> From a computer point of view it's too high to be
> worth bothering. If we have to read 5% of the table we might as well
> do a full scan anyways, it'll be marginally slower but much better
> quality results.

Reading 5% of a 200GB table is going to be considerably faster than
reading the whole thing, if that 5% is being scanned in a way that the
FS understands.

Also, we can optimize this significantly by using the VM, as Robert (I
think) suggested.

In the advanced approaches section, there's also the idea of collecting
analyze data from table pages while they're in memory anyway for other
reasons.

You do seem kind of hostile to the idea of full-page-sampling, going
pretty far beyond the "I'd need to see the math". Why?

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2013-12-09 18:27:01 Re: BUG #8673: Could not open file "pg_multixact/members/xxxx" on slave during hot_standby
Previous Message Jeff Davis 2013-12-09 17:45:51 Re: Extension Templates S03E11