Re: random observations while testing with a 1,8B row table

From: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To: Luke Lonergan <llonergan(at)greenplum(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: random observations while testing with a 1,8B row table
Date: 2006-03-10 19:48:23
Message-ID: 4411D807.1040903@kaltenbrunner.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Luke Lonergan wrote:
> Stefan,
>
> On 3/10/06 9:40 AM, "Stefan Kaltenbrunner" <stefan(at)kaltenbrunner(dot)cc> wrote:
>
>
>>I will summarize some of the just in case somebody is interested:
>
>
> I am!

heh - not surprised :-)

>
>
>>-> table used has 5 integer columns non-indexed during the loads
>>-> hardware is a Dual Opteron 280 with 4 cores(at)2,4GHz and 16GB RAM, data
>>is on a multipathed (busy) SAN with different (RAID 10) Arrays for WAL
>>and data.
>
>
> How many connections out of the machine? How many disks behind each LUN?

2 HBAs in the server, 2x2 possible paths to each LUN.
6 disks for the WAL and 12 disks for the data

>
> So - about 20 Bytes per row (5*4) unless those are int8, but on disk it's
> 108GB/1.8B = 60 Bytes per row on disk. I wonder what all that overhead is?
>
>
>>1. data loading - I'm using COPY with batches of 300M rows it takes
>>
>>*) with one copy running it takes about 20minutes/batch to load the data
>>(~250k rows/sec) and virtually no context switches.
>>
>>*) with two copys running concurrently it takes a bit less then 30
>>minutes/batch and a steady 40k context switches/sec (~340k rows/sec overall)
>>
>>*) with three copy it takes about 40min/batch at 140k context
>>switches/sec (380k rows/sec overall)
>
>
> So, from 15 MB/s up to about 20 MB/s.
>
>
>
>>while the amount of IO going on is quite a lot it looks like we are
>>still mostly CPU-bound for COPY.
>
>
> It's what we see almost always. In this case if your I/O configuration is
> capable of performing at about 3x the 20MB/s max parsing rate, or 60MB/s,
> you will be CPU limited.

the IO-System I use should be capable of doing that if pushed hard
enough :-)

>
> The 3x is approximate, and based on observations, the reasoning underneath
> it is that Postgres is writing the data several times, once to the WAL, then
> from the WAL to the heap files.
>
>
>>2. updating all of the rows in the table:
>>
>>I updated all of the rows in the table with a simple UPDATE testtable
>>set a=a+1;
>>this took about 2,5 hours (~200rows/sec)
>
>
> Ugh. This is where Bizgres MPP shines, I'll try to recreate your test and
> post results. This scales linearly in Bizgres MPP with the number of disks
> and CPUs available, but I would hope for much more than that.

interesting to know, but still I'm testing/playing with postgresql here
not bizgres MPP ...

>
>
>>3. vacuuming this table - it turned out that VACUUM FULL is completly
>>unusable on a table(which i actually expected before) of this size not
>>only to the locking involved but rather due to a gigantic memory
>>requirement and unbelievable slowness.
>
>
> Simple vacuum should be enough IMO.

sure, that was mostly meant as an experiment, if I had to do this on a
production database I would most likely use CLUSTER to get the desired
effect (which in my case was purely getting back the diskspace wasted by
dead tuples)

Stefan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2006-03-10 19:54:10 Re: random observations while testing with a 1,8B row table
Previous Message Marc G. Fournier 2006-03-10 19:16:10 Re: PostgreSQL Anniversary Summit, Call for Contributions