Skip site navigation (1) Skip section navigation (2)

Re: Parallel Vacuum

From: Dimitri <dimitrik(dot)fr(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Cc: Michael Stone <mstone+postgres(at)mathom(dot)us>
Subject: Re: Parallel Vacuum
Date: 2007-03-23 15:37:32
Message-ID: 200703231637.33211.dimitrik.fr@gmail.com (view raw or flat)
Thread:
Lists: pgsql-performance
On Thursday 22 March 2007 19:46, Michael Stone wrote:
> On Thu, Mar 22, 2007 at 07:24:38PM +0100, Dimitri wrote:
> >you're right until you're using a single disk :)
> >Now, imagine you have more disks
>
> I do have more disks. I maximize the I/O performance by dedicating
> different sets of disks to different tables. YMMV. I do suggest watching
> your I/O rates and wallclock time if you try this to see if your
> aggregate is actually substantially faster than the single case. (I
> assume that you haven't yet gotten far enough to actually do performance
> testing.) You may also want to look into tuning your sequential I/O
> performance.
>
> Mike Stone

Mike, specially for you :)

Parallel Vacuum Test
======================

- Database 'db_OBJ'
   PgSQL 8.2.3
   tables: object1, object2, ... object8 (all the same)
   volume: 10.000.000 rows in each table, 22GB in total

- Script Mono Vacuum
  $ cat vac_mono.sh
/usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object1
/usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object2
/usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object3
/usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object4
/usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object5
/usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object6
/usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object7
/usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object8
  $

- Script Parallel Vacuum
  $ cat vac_pll.sh
/usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object1 &
/usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object2 &
/usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object3 &
/usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object4 &
/usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object5 &
/usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object6 &
/usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object7 &
/usr/local/pgsqlFF/bin/vacuumdb -p 5464 -d db_OBJ -t object8 &
wait
  $


Test 1: Cold Clean database (already previously vacuumed)
=========================================================
 Scenario:
  - stop database
  - flush FS cache (umount/mount)
  - start database
  - execute vacuum script

$ time sh vac_mono.sh
real    4m24.23s
user    0m0.00s
sys     0m0.01s

$ time sh vac_pll.sh
real    1m9.36s
user    0m0.00s
sys     0m0.01s


Test 2: Hot Dirty database (modified and not vacuumed)
======================================================
 Scenario:
  - stop database
  - flush FS cache (umount/mount)
  - start database
  - execute 200.000 updates against each from 8 object' tables
  - execute vacuum script

$ time sh vac_mono.sh
real    9m36.90s
user    0m0.00s
sys     0m0.01s

$ time sh vac_pll.sh
real    2m10.41s
user    0m0.00s
sys     0m0.02s


Speed-up x4 is obtained just because single vacuum process reaching max 
80MB/sec in throughput, while with 8 parallel vacuum processes I'm jumping to 
360MB/sec... And speakink about Sequential I/O: while you're doing read - 
file system may again prefetch incoming data in way once you reclaim next 
read - your data will be already in FS cache. However, file system 
cannot 'pre-write' data for you - so having more concurrent writers helps a 
lot! (Of course in case you have a storage configured to keep concurrent 
I/O :))

Well, why all this staff?...
Let's imagine once you need more performance, and you buy 10x times more 
performant storage box, will you still able to kill it with a single-process 
I/O activity? No... :)  To scale well you need to be able to split your work 
in several task executed in parallel. And personally, I'm very happy we can 
do it with vacuum now - the one of the most critical part of PostgreSQL...

Best regards!
-Dimitri

In response to

Responses

pgsql-performance by date

Next:From: Joshua D. DrakeDate: 2007-03-23 15:51:25
Subject: Re: linux - server configuration for small database
Previous:From: Tom LaneDate: 2007-03-23 15:05:06
Subject: Re: linux - server configuration for small database

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group