Re: PG performance in high volume environment (many INSERTs and lots of aggregation reporting)

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Phoenix Kiula <phoenix(dot)kiula(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: PG performance in high volume environment (many INSERTs and lots of aggregation reporting)
Date: 2009-01-28 14:18:52
Message-ID: 603c8f070901280618u3b20da3ey1edbd91e83d7bad2@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

> My question: with that kind of volume and the underlying aggregation
> functions (by product id, dates, possibly IP addresses or at least
> countries of origin..) will PG ever be a good choice? Or should I be
> looking at some other kind of tools? I wonder if OLAP tools would be
> overkill for something that needs to look like a barebones version of
> google analytics limited to our site..

Some other databases might have an optimization that makes this much
faster that it would ordinarily be.

select count(*) from table;

But I don't think anyone has an optimization that makes this fast:

select column, count(*) from table group by 1;

How do you expect the database to get this information other than be
reading the whole table and counting up the number of occurrences of
each value? I guess an OLAP cube might precompute all the answers for
you, but I don't think MySQL is going to do that.

One option is to write a script that runs in the background and
updates all your statistics every 10 minutes or so, dumping the
results into separate (and smaller) tables that you can query quickly.

Another option (which is probably what I would do for really high
volume logging of web traffic) is to write your log records to a flat
file and then postprocess them with perl or something and load the
summary statistics into your database later. PostgreSQL is really
fast, but nothing is as fast as writing to a flatfile.

...Robert

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Richard Huxton 2009-01-28 14:25:41 Re: PG performance in high volume environment (many INSERTs and lots of aggregation reporting)
Previous Message Robert Haas 2009-01-28 14:08:50 Re: LIKE Query performance