Benchmarking tools, methods

From: CSS <css(at)morefoo(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Benchmarking tools, methods
Date: 2011-11-18 09:55:54
Message-ID: 22E8DA4D-5D07-4B2B-920A-DA29650C1909@morefoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hello,

I'm going to be testing some new hardware (see http://archives.postgresql.org/pgsql-performance/2011-11/msg00230.php) and while I've done some very rudimentary before/after tests with pgbench, I'm looking to pull more info than I have in the past, and I'd really like to automate things further.

I'll be starting with basic disk benchmarks (bonnie++ and iozone) and then moving on to pgbench.

I'm running FreeBSD and I'm interested in getting some baseline info on UFS2 single disk (SATA 7200/WD RE4), gmirror, zfs mirror, zfs raidz1, zfs set of two mirrors (ie: two mirrored vdevs in a mirror). Then I'm repeating that with the 4 Intel 320 SSDs, and just to satisfy my curiosity, a zfs mirror with two of the SSDs mirrored as the ZIL.

Once that's narrowed down to a few practical choices, I'm moving on to pgbench. I've found some good info here regarding pgbench that is unfortunately a bit dated: http://www.westnet.com/~gsmith/content/postgresql/

A few questions:

-Any favorite automation or graphing tools beyond what's on Greg's site?
-Any detailed information on creating "custom" pgbench tests?
-Any other postgres benchmarking tools?

I'm also curious about benchmarking using my own data. I tried something long ago that at least gave the illusion of working, but didn't seem quite right to me. I enabled basic query logging on one of our busier servers, dumped the db, and let it run for 24 hours. That gave me the normal random data from users throughout the day as well as our batch jobs that run overnight. I had to grep out and reformat the actual queries from the logfile, but that was not difficult. I then loaded the dump into the test server and basically fed the saved queries into it and timed the result. I also hacked together a script to sample cpu and disk stats every 2S and had that feeding into an rrd database so I could see how "busy" things were.

In theory, this sounded good (to me), but I'm not sure I trust the results. Any suggestions on the general concept? Is it sound? Is there a better way to do it? I really like the idea of using (our) real data.

Lastly, any general suggestions on tools to collect system data during tests and graph it are more than welcome. I can homebrew, but I'm sure I'd be reinventing the wheel.

Oh, and if anyone wants any tests run that would not take an insane amount of time and would be valuable to those on this list, please let me know. Since SSDs have been a hot topic lately and not everyone has a 4 SSDs laying around, I'd like to sort of focus on anything that would shed some light on the whole SSD craze.

The box under test ultimately will have 32GB RAM, 2 quad core 2.13GHz Xeon 5506 cpus and 4 Intel 320 160GB SSDs. I'm recycling some older boxes as well, so I have much more RAM on hand until those are finished.

Thanks,

Charles

ps - considering the new PostgreSQL Performance book that Packt has, any strong feelings about that one way or the other? Does it go very far beyond what's on the wiki?

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Greg Smith 2011-11-18 10:09:20 Re: SSD options, small database, ZFS
Previous Message Arjen van der Meijden 2011-11-18 07:02:14 Re: SSD options, small database, ZFS