Re: intermittant performance problem

From: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To: Mike Charnoky <noky(at)nextbus(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-general(at)postgresql(dot)org
Subject: Re: intermittant performance problem
Date: 2009-03-10 03:35:56
Message-ID: dcc563d10903092035gf777ef4y893c2ba20fe8e9c8@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Mon, Mar 9, 2009 at 8:21 PM, Mike Charnoky <noky(at)nextbus(dot)com> wrote:
> The random sampling query is normally pretty snappy.  It usually takes on
> the order of 1 second to sample a few thousand rows of data out of a few
> million.  The sampling is consistently quick, too.  However, on some days,
> the sampling starts off quick, then when the process starts sampling from a
> different subset of data (different range of times for the same day), the
> sampling query takes a couple minutes.

Then definitely look at saving explain plans before execution to
compare fast to slow runs. This definitely sounds like ocassionally
bad query plans to me so far.

> Regarding the concurrent vacuuming, this is definitely not happening.  I
> always check pg_stat_activity whenever the sampling process starts to lag
> behind.  I have never seen a vacuum running during this time.

And if autovac is getting in the ways, try adjusting the various
autovac options. spefically autovacuum_vacuum_cost_delay set to 10 or
20 (mS).

>
> Interesting idea to issue the EXPLAIN first... I will see if I can
> instrument the sampling program to do this.
>
> Thanks for your help Tom.
>
>
> Mike
>
> Tom Lane wrote:
>>
>> Mike Charnoky <noky(at)nextbus(dot)com> writes:
>>>
>>> The sampling query which runs really slow on some days looks something
>>> like this:
>>
>>> INSERT INTO sampled_data
>>>   (item_name, timestmp, ... )
>>>   SELECT item_name, timestmp, ... )
>>>   FROM raw_data
>>>   WHERE timestmp >= ? and timestmp < ?
>>>   AND item_name=?
>>>   AND some_data_field NOTNULL
>>>   ORDER BY random()
>>>   LIMIT ?;
>>
>> Hmph, I'd expect that that would run pretty slowly *all* the time :-(.
>> There's no good way to optimize "ORDER BY random()".  However, it seems
>> like the first thing you should do is modify the program so that it
>> issues an EXPLAIN for that right before actually doing the query, and
>> then you could see if the plan is different on the slow days.
>>
>>> We have done a great deal of PG tuning, including the autovacuum for the
>>> "raw_data" table.  Autovacuum kicks like clockwork every day on that
>>> table after the sampling process finishes (after one day's worth of data
>>> is deleted from "raw_data" table, a roughly 7% change in size).
>>
>> Also, are you sure you have ruled out the possibility that the problem
>> comes from autovac kicking in *while* the update is running?
>>
>>                        regards, tom lane
>>
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

--
When fascism comes to America, it will be the intolerant selling it as
diversity.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Josh Trutwin 2009-03-10 04:23:02 8.3.6 build error on Debian Lenny
Previous Message Tom Lane 2009-03-10 03:19:58 Re: C++ User-defined functions