Re: ANALYZE sampling is too good

From: Claudio Freire <klaussfreire(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Florian Pflug <fgp(at)phlo(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ANALYZE sampling is too good
Date: 2013-12-12 18:33:49
Message-ID: CAGTBQpbFhAMnJcA8qOj1-0AgjVM6+L2d1nUjZRyYqMv5Sjrjtw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 12, 2013 at 3:29 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Jeff Janes <jeff(dot)janes(at)gmail(dot)com> writes:
>> It would be relatively easy to fix this if we trusted the number of visible
>> rows in each block to be fairly constant. But without that assumption, I
>> don't see a way to fix the sample selection process without reading the
>> entire table.
>
> Yeah, varying tuple density is the weak spot in every algorithm we've
> looked at. The current code is better than what was there before, but as
> you say, not perfect. You might be entertained to look at the threads
> referenced by the patch that created the current sampling method:
> http://www.postgresql.org/message-id/1tkva0h547jhomsasujt2qs7gcgg0gtvrp@email.aon.at
>
> particularly
> http://www.postgresql.org/message-id/flat/ri5u70du80gnnt326k2hhuei5nlnimonbs(at)email(dot)aon(dot)at#ri5u70du80gnnt326k2hhuei5nlnimonbs@email.aon.at
>
>
> However ... where this thread started was not about trying to reduce
> the remaining statistical imperfections in our existing sampling method.
> It was about whether we could reduce the number of pages read for an
> acceptable cost in increased statistical imperfection.

Well, why not take a supersample containing all visible tuples from N
selected blocks, and do bootstrapping over it, with subsamples of M
independent rows each?

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2013-12-12 18:52:34 Re: Changeset Extraction Interfaces
Previous Message Tom Lane 2013-12-12 18:29:40 Re: ANALYZE sampling is too good