Re: same query in high number of times

From: Scott Carey <scott(at)richrelevance(dot)com>
To: Peter Alban <peter(dot)alban2(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: same query in high number of times
Date: 2009-06-22 23:06:48
Message-ID: C6655C98.8778%scott@richrelevance.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 6/22/09 2:11 PM, "Peter Alban" <peter(dot)alban2(at)gmail(dot)com> wrote:

> hey folks !
>
> eventually the removing of the group by did improve but still my concern is
> why cant we take the result from memory given its same resultset .
> But I keep pusing for the developers to move to memcached so we overcome this
> limitation .
>
> cheers,
> Peter
>
Caching of that sort is better suited to client code or a middle tier
caching technology. It isn¹t that simple to resolve that the same query
will return the same result for most queries. Between two executions
another could have made a change.

But more importantly, the client is what knows what level of Ostaleness¹ is
appropriate for the data. A RDBMS will operate on a no tolerance policy for
returning stale data and with strict transactional visibility rules. If
your application only needs a result that is fresh within some window of
time, it should do the caching (or another component). This is far more
efficient. A RDBMS is a very poorly performing data cache < its built to
quickly resolve arbitrary relational queries with strict transactional
guarantees, not to cache a set of answers. Although given a hammer
everything looks like a nail . . .

> On Mon, Jun 22, 2009 at 5:23 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Sun, Jun 21, 2009 at 4:59 PM, Peter Alban<peter(dot)alban2(at)gmail(dot)com> wrote:
>>>
>>>
>>> On Sun, Jun 21, 2009 at 10:01 PM, Justin Graf <justin(at)emproshunts(dot)com>
>>> wrote:
>>>>
>>>> Peter Alban wrote:
>>>>
>>>> duration: 2533.734 ms  statement:
>>>>
>>>> SELECT news.url_text,news.title, comments.name <http://comments.name> ,
>>>> comments.createdate,
>>>> comments.user_id, comments.comment FROM news, comments WHERE
>>>> comments.cid=news.id <http://news.id>   AND comments.published='1' GROUP BY
>>>> news.url_text,news.title comments.name <http://comments.name> ,
>>>> comments.createdate,
>>>> comments.user_id, comments.comment ORDER BY comments.createdate DESC LIMIT
>>>> 3
>>>>
>>>>
>>>> And here is the query plan :
>>>>                                                               QUERY
>>>> PLAN
>>>>
>>>> ---------------------------------------------------------------------------
>>>> -------------------------------------------------------------
>>>>  Limit  (cost=4313.54..4313.55 rows=3 width=595) (actual
>>>> time=288.525..288.528 rows=3 loops=1)
>>>>    ->  Sort  (cost=4313.54..4347.26 rows=13486 width=595) (actual
>>>> time=288.523..288.523 rows=3 loops=1)
>>>>          Sort Key: comments.createdate
>>>>          ->  HashAggregate  (cost=3253.60..3388.46 rows=13486 width=595)
>>>> (actual time=137.521..148.132 rows=13415 loops=1)
>>>>                ->  Hash Join  (cost=1400.73..3051.31 rows=13486 width=595)
>>>> (actual time=14.298..51.049 rows=13578 loops=1)
>>>>                      Hash Cond: ("outer".cid = "inner".id)
>>>>                      ->  Seq Scan on comments  (cost=0.00..1178.72
>>>> rows=13480 width=522) (actual time=0.012..17.434 rows=13418 loops=1)
>>>>                            Filter: (published = 1)
>>>>                      ->  Hash  (cost=1391.18..1391.18 rows=3818 width=81)
>>>> (actual time=14.268..14.268 rows=3818 loops=1)
>>>>                            ->  Seq Scan on news  (cost=0.00..1391.18
>>>> rows=3818 width=81) (actual time=0.021..10.072 rows=3818 loops=1)
>>>>
>>>> The same is being requested from different sessions . So why is it not
>>>> being cached .
>>>>
>>>>
>>>> Because the query results are not cached only the RAW tables are.   The
>>>> query is rerun every time it is requested.
>>>>
>>>> What is the group by clause accomplishing???
>>>> The sorting and hash Aggregate is eating up all the time
>>>
>>> So this should mean that having say a 5 mb table in memory doing such query
>>> above takes 2 secs in memory ?
>>
>> Nope.  But as others have pointed out, you need to figure out why it's
>> taking 2.5 s but EXPLAIN ANALYZE is only saying 300 ms.
>>
>> There's other things you can do to optimize this query; for example:
>>
>> 1. Try creating an index on comments (createdate), and don't forget to
>> ANALYZE the table afterward, or
>>
>> 2. Modify the query to remove the probably-unnecessary GROUP BY.
>>
>> But figuring out the times may be the first thing.  My guess is that
>> the 2.5 s time is a time from your logs, maybe at a time when the
>> system was busy, and the 300 ms time was what you got it when you ran
>> it some other time.  But maybe there's some other explanation.  You
>> should try to figure it out.
>>
>> ...Robert
>
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Laurent Laborde 2009-06-23 08:52:08 Re: same query in high number of times
Previous Message Peter Alban 2009-06-22 21:11:06 Re: same query in high number of times