Quick Links

Re: intermittant performance problem

From:	Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To:	Mike Charnoky <noky(at)nextbus(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-general(at)postgresql(dot)org
Subject:	Re: intermittant performance problem
Date:	2009-03-10 03:35:56
Message-ID:	dcc563d10903092035gf777ef4y893c2ba20fe8e9c8@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On Mon, Mar 9, 2009 at 8:21 PM, Mike Charnoky <noky(at)nextbus(dot)com> wrote:
> The random sampling query is normally pretty snappy. It usually takes on
> the order of 1 second to sample a few thousand rows of data out of a few
> million. The sampling is consistently quick, too. However, on some days,
> the sampling starts off quick, then when the process starts sampling from a
> different subset of data (different range of times for the same day), the
> sampling query takes a couple minutes.

Then definitely look at saving explain plans before execution to
compare fast to slow runs. This definitely sounds like ocassionally
bad query plans to me so far.

> Regarding the concurrent vacuuming, this is definitely not happening. I
> always check pg_stat_activity whenever the sampling process starts to lag
> behind. I have never seen a vacuum running during this time.

And if autovac is getting in the ways, try adjusting the various
autovac options. spefically autovacuum_vacuum_cost_delay set to 10 or
20 (mS).

>
> Interesting idea to issue the EXPLAIN first... I will see if I can
> instrument the sampling program to do this.
>
> Thanks for your help Tom.
>
>
> Mike
>
> Tom Lane wrote:
>>
>> Mike Charnoky <noky(at)nextbus(dot)com> writes:
>>>
>>> The sampling query which runs really slow on some days looks something
>>> like this:
>>
>>> INSERT INTO sampled_data
>>> (item_name, timestmp, ... )
>>> SELECT item_name, timestmp, ... )
>>> FROM raw_data
>>> WHERE timestmp >= ? and timestmp < ?
>>> AND item_name=?
>>> AND some_data_field NOTNULL
>>> ORDER BY random()
>>> LIMIT ?;
>>
>> Hmph, I'd expect that that would run pretty slowly *all* the time :-(.
>> There's no good way to optimize "ORDER BY random()". However, it seems
>> like the first thing you should do is modify the program so that it
>> issues an EXPLAIN for that right before actually doing the query, and
>> then you could see if the plan is different on the slow days.
>>
>>> We have done a great deal of PG tuning, including the autovacuum for the
>>> "raw_data" table. Autovacuum kicks like clockwork every day on that
>>> table after the sampling process finishes (after one day's worth of data
>>> is deleted from "raw_data" table, a roughly 7% change in size).
>>
>> Also, are you sure you have ruled out the possibility that the problem
>> comes from autovac kicking in *while* the update is running?
>>
>> regards, tom lane
>>
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

--
When fascism comes to America, it will be the intolerant selling it as
diversity.

In response to

Re: intermittant performance problem at 2009-03-10 02:21:25 from Mike Charnoky

Browse pgsql-general by date

	From	Date	Subject
Next Message	Josh Trutwin	2009-03-10 04:23:02	8.3.6 build error on Debian Lenny
Previous Message	Tom Lane	2009-03-10 03:19:58	Re: C++ User-defined functions