Re: Simple (hopefully) throughput question?

From: "Pierre C" <lists(at)peufeu(dot)com>
To: "Nick Matheson" <Nick(dot)D(dot)Matheson(at)noaa(dot)gov>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Simple (hopefully) throughput question?
Date: 2010-11-05 09:05:25
Message-ID: op.vlowvbdreorkce@apollo13
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Thu, 04 Nov 2010 15:42:08 +0100, Nick Matheson
<Nick(dot)D(dot)Matheson(at)noaa(dot)gov> wrote:
> I think your comments really get at what our working hypothesis was, but
> given that our experience is limited compared to you all here on the
> mailing lists we really wanted to make sure we weren't missing any
> alternatives. Also the writing of custom aggregators will likely
> leverage any improvements we make to our storage throughput.

Quick test : SELECT sum(x) FROM a table with 1 INT column, 3M rows, cached
=> 244 MB/s
=> 6.7 M rows/s

Same on MySQL :

size SELECT sum(x) (cached)
postgres 107 MB 0.44 s
myisam 20 MB 0.42 s
innodb 88 MB 1.98 s

As you can see, even though myisam is much smaller (no transaction data to
store !) the aggregate performance isn't any better, and for innodb it is
much worse.

Even though pg's per-row header is large, seq scan / aggregate performance
is very good.

You can get performance in this ballpark by writing a custom aggregate in
C ; it isn't very difficult, the pg source code is clean and full of
insightful comments.

- take a look at how contrib/intagg works
- http://www.postgresql.org/files/documentation/books/aw_pgsql/node168.html
- and the pg manual of course

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message A B 2010-11-05 10:59:43 Running PostgreSQL as fast as possible no matter the consequences
Previous Message Nick Matheson 2010-11-04 19:24:59 Re: Simple (hopefully) throughput question?