Quick Links

TPC-H Scaling Factors X PostgreSQL Cluster Command

From:	"Nelson Kotowski" <nkotowski(at)gmail(dot)com>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	TPC-H Scaling Factors X PostgreSQL Cluster Command
Date:	2007-04-21 14:54:42
Message-ID:	d34b24380704210754q53aa86c8l3a63211d81febbf7@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Hello everyone,

This is my first post in here, i am in need of some help...

Wel, i am running PostgreSQL 8.2.4, in a single node, in this machine:

Dell PowerEdge 1950
Intel Xeon 2.33 (2 CPUs)
4GB RAM
Sata HD 160GB
Debian Distribution

Some relevant parameters:

In Linux:
-> SHMMAX: 32MB
In postgresql.conf:
-> shared_buffers: 24MB
-> maintenance_work_mem: 16MB

I need to: build, load and do some optimization procedures in a TPC-H
benchmark database.

So far, i need to do it in three different scale factors (1, 2 and 5GB
databases).

My build process comprehends creating the tables without any foreign keys,
indexes, etc. - Running OK!
Then, i load the data from the flat files generated through DBGEN software
into these tables. - Running OK!

Finally, i run a "optimize" script that does the following:

- Alter the tables to add the mandatory foreign keys;
- Create all mandatory indexes;
- Cluster the orders table by the orders table index;
- Cluster the lineitem table by the lineitem table index;
- Vacuum the database;
- Analyze statistics.

This is the step which is causing me some headaches, mainly related to the
5GB database. I identified that the cluster command over the lineitem table
(cluster idx_lineitem on lineitem) is the responsible. I got to this
conclusion because when i run it in the 1GB and 2GB database i am able to
complete this script in 10 and 30 minutes each. But when i run this command
over the 5GB database, it simply seems as it won't end. I watched it running
over 12 hours and nothing happened.

To investigate a bit, i tried to tune these parameters and these parameters
only, and re-run the script (rebooted the machine and restarted it all
over):

In Linux: SHMMAX -> Tuned it to 2GB via echo "..." > /proc/sys/kernel/shmmax

In postgresql.conf:

shared_buffers: 512MB
maintenance_work_mem: 800MB

I thought that this might improve the performance, but as a matter of fact,
that's what happened:

1 GB database - cluster command time remains the same (more or less 10
minutes)
2 GB database - cluster command now takes 3 hours instead of 30 minutes! BAD
5 GB database - still can't complete the command in over 12 hours.

To add some info, i did a top command on the machine, i saw that the
postmaster consumes all the "shared_buffers" configured in the physical
memory (13,3% of the RAM --> 512MB of 4GB) but the total free mem is 0% (all
the 4GB is used, but not by the postmaster), and no swap is in use, CPU is
around 1% busy (this 1% is for the postmaster), and this machine is
dedicate, for my personal use, and there's nothing else running but
PostgreSQL.

Does anyone have any clues?

Thanks in advance,
Nelson P Kotowski Filho.

Responses

Re: TPC-H Scaling Factors X PostgreSQL Cluster Command at 2007-04-23 09:46:45 from Heikki Linnakangas
Re: TPC-H Scaling Factors X PostgreSQL Cluster Command at 2007-04-24 03:39:42 from Greg Smith

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Colin McGuigan	2007-04-21 15:33:38	Odd problem with planner choosing seq scan
Previous Message	cluster	2007-04-21 10:43:27	Re: FK triggers misused?