Re: ANALYZE sampling is too good

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ANALYZE sampling is too good
Date: 2013-12-06 08:49:33
Message-ID: CAA4eK1K1R011==4-xuYe9WYFqWQiT=Hayp-Aa4J=gc0Xy9=2xA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Dec 6, 2013 at 7:22 AM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> On Thu, Dec 5, 2013 at 3:50 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>> There are fairly well researched algorithms for block-based sampling
>> which estimate for the skew introduced by looking at consecutive rows in
>> a block. In general, a minimum sample size of 5% is required, and the
>> error is no worse than our current system. However, the idea was shot
>> down at the time, partly because I think other hackers didn't get the math.
>
> I think that this certainly warrants revisiting. The benefits would be
> considerable.
>
> Has anyone ever thought about opportunistic ANALYZE piggy-backing on
> other full-table scans? That doesn't really help Greg, because his
> complaint is mostly that a fresh ANALYZE is too expensive, but it
> could be an interesting, albeit risky approach.

Is only fresh ANALYZE costly or consecutive one's are also equally costly?

Doing it in some background operation might not be a bad idea, but doing it
in backend query execution (seq scan) might add overhead for query response time
especially if part or most of data for table is in RAM, so here
overhead due to actual read
might not be very high but the calculation for analyse (like sort)
will make it costly.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2013-12-06 09:21:14 Re: ANALYZE sampling is too good
Previous Message Andrew Gierth 2013-12-06 08:13:12 Re: WITHIN GROUP patch