Re: Parallel Vacuum

From: Dimitri <dimitrik(dot)fr(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Cc: Michael Stone <mstone+postgres(at)mathom(dot)us>
Subject: Re: Parallel Vacuum
Date: 2007-03-23 15:37:32
Message-ID: 200703231637.33211.dimitrik.fr@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Thursday 22 March 2007 19:46, Michael Stone wrote:
> On Thu, Mar 22, 2007 at 07:24:38PM +0100, Dimitri wrote:
> >you're right until you're using a single disk :)
> >Now, imagine you have more disks
>
> I do have more disks. I maximize the I/O performance by dedicating
> different sets of disks to different tables. YMMV. I do suggest watching
> your I/O rates and wallclock time if you try this to see if your
> aggregate is actually substantially faster than the single case. (I
> assume that you haven't yet gotten far enough to actually do performance
> testing.) You may also want to look into tuning your sequential I/O
> performance.
>
> Mike Stone

Mike, specially for you :)

Parallel Vacuum Test
======================

- Database 'db_OBJ'
PgSQL 8.2.3
tables: object1, object2, ... object8 (all the same)
volume: 10.000.000 rows in each table, 22GB in total

- Script Mono Vacuum
$ cat vac_mono.sh
/usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object1
/usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object2
/usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object3
/usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object4
/usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object5
/usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object6
/usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object7
/usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object8
$

- Script Parallel Vacuum
$ cat vac_pll.sh
/usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object1 &
/usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object2 &
/usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object3 &
/usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object4 &
/usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object5 &
/usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object6 &
/usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object7 &
/usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object8 &
wait
$

Test 1: Cold Clean database (already previously vacuumed)
=========================================================
Scenario:
- stop database
- flush FS cache (umount/mount)
- start database
- execute vacuum script

$ time sh vac_mono.sh
real 4m24.23s
user 0m0.00s
sys 0m0.01s

$ time sh vac_pll.sh
real 1m9.36s
user 0m0.00s
sys 0m0.01s

Test 2: Hot Dirty database (modified and not vacuumed)
======================================================
Scenario:
- stop database
- flush FS cache (umount/mount)
- start database
- execute 200.000 updates against each from 8 object' tables
- execute vacuum script

$ time sh vac_mono.sh
real 9m36.90s
user 0m0.00s
sys 0m0.01s

$ time sh vac_pll.sh
real 2m10.41s
user 0m0.00s
sys 0m0.02s

Speed-up x4 is obtained just because single vacuum process reaching max
80MB/sec in throughput, while with 8 parallel vacuum processes I'm jumping to
360MB/sec... And speakink about Sequential I/O: while you're doing read -
file system may again prefetch incoming data in way once you reclaim next
read - your data will be already in FS cache. However, file system
cannot 'pre-write' data for you - so having more concurrent writers helps a
lot! (Of course in case you have a storage configured to keep concurrent
I/O :))

Well, why all this staff?...
Let's imagine once you need more performance, and you buy 10x times more
performant storage box, will you still able to kill it with a single-process
I/O activity? No... :) To scale well you need to be able to split your work
in several task executed in parallel. And personally, I'm very happy we can
do it with vacuum now - the one of the most critical part of PostgreSQL...

Best regards!
-Dimitri

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Joshua D. Drake 2007-03-23 15:51:25 Re: linux - server configuration for small database
Previous Message Tom Lane 2007-03-23 15:05:06 Re: linux - server configuration for small database