Re: random observations while testing with a 1,8B row

From: "Luke Lonergan" <llonergan(at)greenplum(dot)com>
To: "Stefan Kaltenbrunner" <stefan(at)kaltenbrunner(dot)cc>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: random observations while testing with a 1,8B row
Date: 2006-03-10 19:10:48
Message-ID: C0370F38.1EF4B%llonergan@greenplum.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Stefan,

On 3/10/06 9:40 AM, "Stefan Kaltenbrunner" <stefan(at)kaltenbrunner(dot)cc> wrote:

> I will summarize some of the just in case somebody is interested:

I am!

> -> table used has 5 integer columns non-indexed during the loads
> -> hardware is a Dual Opteron 280 with 4 cores(at)2,4GHz and 16GB RAM, data
> is on a multipathed (busy) SAN with different (RAID 10) Arrays for WAL
> and data.

How many connections out of the machine? How many disks behind each LUN?

So - about 20 Bytes per row (5*4) unless those are int8, but on disk it's
108GB/1.8B = 60 Bytes per row on disk. I wonder what all that overhead is?

> 1. data loading - I'm using COPY with batches of 300M rows it takes
>
> *) with one copy running it takes about 20minutes/batch to load the data
> (~250k rows/sec) and virtually no context switches.
>
> *) with two copys running concurrently it takes a bit less then 30
> minutes/batch and a steady 40k context switches/sec (~340k rows/sec overall)
>
> *) with three copy it takes about 40min/batch at 140k context
> switches/sec (380k rows/sec overall)

So, from 15 MB/s up to about 20 MB/s.

> while the amount of IO going on is quite a lot it looks like we are
> still mostly CPU-bound for COPY.

It's what we see almost always. In this case if your I/O configuration is
capable of performing at about 3x the 20MB/s max parsing rate, or 60MB/s,
you will be CPU limited.

The 3x is approximate, and based on observations, the reasoning underneath
it is that Postgres is writing the data several times, once to the WAL, then
from the WAL to the heap files.

> 2. updating all of the rows in the table:
>
> I updated all of the rows in the table with a simple UPDATE testtable
> set a=a+1;
> this took about 2,5 hours (~200rows/sec)

Ugh. This is where Bizgres MPP shines, I'll try to recreate your test and
post results. This scales linearly in Bizgres MPP with the number of disks
and CPUs available, but I would hope for much more than that.

> 3. vacuuming this table - it turned out that VACUUM FULL is completly
> unusable on a table(which i actually expected before) of this size not
> only to the locking involved but rather due to a gigantic memory
> requirement and unbelievable slowness.

Simple vacuum should be enough IMO.

- Luke

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Darcy Buskermolen 2006-03-10 19:12:40 Re: [COMMITTERS] pgsql: Remove Christof Petig copyright on include file,
Previous Message Josh Berkus 2006-03-10 18:58:52 Re: PostgreSQL Anniversary Summit, Call for Contributions