Re: auto_explain WAS: RFC: Timing Events

From: Greg Stark <stark(at)mit(dot)edu>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jim Nasby <jim(at)nasby(dot)net>, Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, Greg Smith <greg(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: auto_explain WAS: RFC: Timing Events
Date: 2013-02-26 03:22:45
Message-ID: CAM-w4HPUS_kAk1HSWFHA+EcTuS7COg8FyH8_PB-ndbE5+1ny3Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 25, 2013 at 8:26 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Sun, Feb 24, 2013 at 7:27 PM, Jim Nasby <jim(at)nasby(dot)net> wrote:
>> We actually do that in our application and have discovered that random
>> sampling can end up significantly skewing your data.
>
> /me blinks.
>
> How so?

Sampling is a pretty big area of statistics. There are dozens of
sampling methods to deal with various problems that occur with
different types of data distributions.

One problem is if you have some very rare events then random sampling
can produce odd results since those rare events will drop out entirely
unless your sample is very large whereas less rare events are
represented proportionally. There are sampling methods that ensure
that x% of the rare events are included even if those rare events are
less than x% of your total data set. One of those might be appropriate
to use for profiling data when you're looking for rare slow queries
amongst many faster queries.

--
greg

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2013-02-26 06:25:24 Re: 9.2.3 crashes during archive recovery
Previous Message Tom Lane 2013-02-26 02:13:25 Re: [RFC] Extend namespace of valid guc names