Re: Increasing GROUP BY CHAR columns speed

From: Scott Carey <scott(at)richrelevance(dot)com>
To: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>, Andrus <kobruleht2(at)hot(dot)ee>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Increasing GROUP BY CHAR columns speed
Date: 2008-11-28 19:51:07
Message-ID: BDFBB77C9E07BE4A984DAAE981D19F961ACA17D9CE@EXVMBX018-1.exch018.msoutlookonline.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I'm not sure that postgres allocates the whole work_mem each time, and in any event, the OS won't try and allocate to RAM until the page is actually used. My experience is that oversized work_mem doesn't hurt until it is actually used. Else, my configuration with 1000 concurrent connections and work_mem = 1GB would have blown up. I don't have that much RAM + SWAP * overcommit. Of the 1000 connections, only 8 run queries that would ever need more than 2 or 3 MB of space to execute. Of course, one has to be certain what the client connections do for it to be very over sized, so I would not recommend the above in general.

----------
Back to this query:

In the query case shown, the explain analyze shows:
"GroupAggregate (cost=43403.38..52521.63 rows=41923 width=838) (actual
time=8083.171..8620.908 rows=577 loops=1)"

Thus, the planner thought that it needed ~40K ish rows for results of ~800 bytes in size, hence an approximation of the required hash space is 80M. However, it returns only 577 rows, so the actual needed hash space is much smaller. This is a common problem I've seen -- the query planner has very conservative estimates for result row counts from any non-trivial filter condition / aggregate which leads to poor query plans.

I'd be suprised if this query used more than 1MB total work_mem in reality for that last step if it used a hash. As it stands, sorting will actually use much more.

I'm still not sure why the planner chose to sort rather than hash with oversized work_mem (is there an implied order in the query results I missed?). My guess is that this query can still get much faster if a hash is possible on the last part. It looks like the gain so far has more to do with sorting purely in memory which reduced the number of compares required. But that is just a guess.

________________________________________
From: Scott Marlowe [scott(dot)marlowe(at)gmail(dot)com]
Sent: Friday, November 28, 2008 10:24 AM
To: Andrus
Cc: Scott Carey; pgsql-performance(at)postgresql(dot)org
Subject: Re: [PERFORM] Increasing GROUP BY CHAR columns speed

On Fri, Nov 28, 2008 at 10:58 AM, Andrus <kobruleht2(at)hot(dot)ee> wrote:
>
> SET work_mem = 2097151 (this is max allowed value) or SET work_mem = 97151
> decreases query time from 12 seconds to 9 seconds.

Setting work_mem so high that it allocates memory that isn't there
WILL slow down your queries, because the OS will then wind up swapping
out one part of the swap to make room for another part. There are
values between 100M and 2G. Run it with increasing work_mem from 100
to 500 or so Meg and see if that helps. Keep an eye on vmstat 1 or
something to see if your machine starts swapping out while running the
query. If it does you've overallocated memory and things will start
to slow down a LOT.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Andrus 2008-11-28 19:57:43 Re: Increasing GROUP BY CHAR columns speed
Previous Message Scott Marlowe 2008-11-28 18:24:35 Re: Increasing GROUP BY CHAR columns speed