TPC-H Scaling Factors X PostgreSQL Cluster Command

From: "Nelson Kotowski" <nkotowski(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: TPC-H Scaling Factors X PostgreSQL Cluster Command
Date: 2007-04-21 14:54:42
Message-ID: d34b24380704210754q53aa86c8l3a63211d81febbf7@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hello everyone,

This is my first post in here, i am in need of some help...

Wel, i am running PostgreSQL 8.2.4, in a single node, in this machine:

Dell PowerEdge 1950
Intel Xeon 2.33 (2 CPUs)
4GB RAM
Sata HD 160GB
Debian Distribution

Some relevant parameters:

In Linux:
-> SHMMAX: 32MB
In postgresql.conf:
-> shared_buffers: 24MB
-> maintenance_work_mem: 16MB

I need to: build, load and do some optimization procedures in a TPC-H
benchmark database.

So far, i need to do it in three different scale factors (1, 2 and 5GB
databases).

My build process comprehends creating the tables without any foreign keys,
indexes, etc. - Running OK!
Then, i load the data from the flat files generated through DBGEN software
into these tables. - Running OK!

Finally, i run a "optimize" script that does the following:

- Alter the tables to add the mandatory foreign keys;
- Create all mandatory indexes;
- Cluster the orders table by the orders table index;
- Cluster the lineitem table by the lineitem table index;
- Vacuum the database;
- Analyze statistics.

This is the step which is causing me some headaches, mainly related to the
5GB database. I identified that the cluster command over the lineitem table
(cluster idx_lineitem on lineitem) is the responsible. I got to this
conclusion because when i run it in the 1GB and 2GB database i am able to
complete this script in 10 and 30 minutes each. But when i run this command
over the 5GB database, it simply seems as it won't end. I watched it running
over 12 hours and nothing happened.

To investigate a bit, i tried to tune these parameters and these parameters
only, and re-run the script (rebooted the machine and restarted it all
over):

In Linux: SHMMAX -> Tuned it to 2GB via echo "..." > /proc/sys/kernel/shmmax

In postgresql.conf:

shared_buffers: 512MB
maintenance_work_mem: 800MB

I thought that this might improve the performance, but as a matter of fact,
that's what happened:

1 GB database - cluster command time remains the same (more or less 10
minutes)
2 GB database - cluster command now takes 3 hours instead of 30 minutes! BAD
5 GB database - still can't complete the command in over 12 hours.

To add some info, i did a top command on the machine, i saw that the
postmaster consumes all the "shared_buffers" configured in the physical
memory (13,3% of the RAM --> 512MB of 4GB) but the total free mem is 0% (all
the 4GB is used, but not by the postmaster), and no swap is in use, CPU is
around 1% busy (this 1% is for the postmaster), and this machine is
dedicate, for my personal use, and there's nothing else running but
PostgreSQL.

Does anyone have any clues?

Thanks in advance,
Nelson P Kotowski Filho.

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Colin McGuigan 2007-04-21 15:33:38 Odd problem with planner choosing seq scan
Previous Message cluster 2007-04-21 10:43:27 Re: FK triggers misused?