Skip site navigation (1) Skip section navigation (2)

Re: PG performance in high volume environment (many INSERTs and lots of aggregation reporting)

From: Richard Huxton <dev(at)archonet(dot)com>
To: Phoenix Kiula <phoenix(dot)kiula(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: PG performance in high volume environment (many INSERTs and lots of aggregation reporting)
Date: 2009-01-28 14:25:41
Message-ID: 49806AE5.8020307@archonet.com (view raw or flat)
Thread:
Lists: pgsql-performance
Phoenix Kiula wrote:
> [Ppsted similar note to PG General but I suppose it's more appropriate
> in this list. Apologies for cross-posting.]
> 
> Hi. Further to my bafflement with the "count(*)" queries as described
> in this thread:
> 
> http://archives.postgresql.org/pgsql-general/2009-01/msg00804.php
> 
> It seems that whenever this question has come up, Postgresql comes up
> very short in terms of "count(*)" functions.

Sorry - I'm confused. That thread doesn't seem to contain a slow
count(*) query. You seem to be saying you're having problems with the
query taking 10-15 seconds, but the example takes less then half a
second. How have you identified the count() as being the problem here?

> The performance is always slow, because of the planner's need to guess
> and such. I don't fully understand how the statistics work (and the
> explanation on the PG website is way too geeky) but he columns I work
> with already have a stat level of 100. Not helping at all.

But your own email says it's slow sometimes:
  "My queries are fast in general *except* the first time"
I'm not sure how the planner comes into this.

> We are now considering a web based logging functionality for users of
> our website. This means the table could be heavily INSERTed into. We
> get about 10 million hits a day, and I'm guessing that we will have to
> keep this data around for a while.
> 
> My question: with that kind of volume and the underlying aggregation
> functions (by product id, dates, possibly IP addresses or at least
> countries of origin..) will PG ever be a good choice?

A good choice compared to what?

> Or should I be
> looking at some other kind of tools? I wonder if OLAP tools would be
> overkill for something that needs to look like a barebones version of
> google analytics limited to our site..

Typically you'd summarise the data by hour/day via triggers / a
scheduled script if you weren't going towards a pre-packaged OLAP
toolkit. Otherwise you're going to have to scan the hundreds of millions
of rows you've accumulated.

> Appreciate any thoughts. If possible I would prefer to tone down any
> requests for MySQL and such!

I'm not sure MySQL is going to help you here - if you were running lots
of small, simple queries it might make sense. If you want to aggregate
data by varying criteria I don't think there is any sensible
optimisation (other than pre-calculating summaries).

-- 
  Richard Huxton
  Archonet Ltd

In response to

pgsql-performance by date

Next:From: Merlin MoncureDate: 2009-01-28 17:55:16
Subject: Re: PG performance in high volume environment (many INSERTs and lots of aggregation reporting)
Previous:From: Robert HaasDate: 2009-01-28 14:18:52
Subject: Re: PG performance in high volume environment (many INSERTs and lots of aggregation reporting)

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group