Skip site navigation (1) Skip section navigation (2)

Re: PG performance in high volume environment (many INSERTs and lots of aggregation reporting)

From: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To: Chris Browne <cbbrowne(at)acm(dot)org>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: PG performance in high volume environment (many INSERTs and lots of aggregation reporting)
Date: 2009-01-30 07:53:37
Message-ID: dcc563d10901292353v5ba77f5al5f94f3d70b797d31@mail.gmail.com (view raw or flat)
Thread:
Lists: pgsql-performance
On Thu, Jan 29, 2009 at 1:56 PM, Chris Browne <cbbrowne(at)acm(dot)org> wrote:
>
> It is common for systems where it is necessary for aggregation
> reporting to be fast to do pre-computation of the aggregates, and that
> is in no way specific to PostgreSQL.
>
> If you need *really* fast aggregates, then it will be worthwhile to
> put together triggers or procedures or something of the sort to help
> pre-compute the aggregates.

Just to add to this, at me last employer in Chicago, we had a database
from a very large database company who's CEO makes more than all the
people on this mailing list combined that shall not be named for
reasons like I don't want to be sued.  This database had a large
statistical dataset we replicated over to pgsql on a by the minute
basis so we could run big ugly queries anytime we felt like it without
blowing out the production database.

At night, or by hand, I would run such queries as select count(*) from
reallyreallyreallybigstatstable on it and compare it to postgresql.
PostgreSQL would take about 4 or 5 minutes to run this on a local
server running a software RAID-10 4 disc set on a single core P-4 Dell
workstation, and the really really big server in production took about
15 to 20 seconds.

Our local test server that ran the same really big database that
cannot be named and had a 16 disk RAID-6 array with gigs of memory and
4 cpu cores, took about 45 seconds to a minute to run the same select
count(*) query.

All of the machines showed high CPU and moderate I/O usage while
running said query.

So, there's probably some room for improvement in pgsql's way of doing
things, but it's not like the other database software was providing
instantaneous answers.  Basically, the second that a database server
becomes fast at running lots of update / select queries in a mixed
environment, things like fast select count(*) get slower.

To the OP: Try running 100 transactional clients against mysql
(updates/inserts/deletes/selects) while running a select count(*) and
see how it behaves.  Single thread use cases are kind of uninteresting
compared to lotsa users.  But if single thread use cases are your
bread and butter, then pgsql is possibly a poor choice of db.

In response to

pgsql-performance by date

Next:From: Scott CareyDate: 2009-01-30 19:41:24
Subject: Re: Poor plan choice in prepared statement
Previous:From: Tom LaneDate: 2009-01-30 01:25:16
Subject: Re: Using multiple cores for index creation?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group