From: | "Nelson Kotowski" <nkotowski(at)gmail(dot)com> |
---|---|
To: | pgsql-performance(at)postgresql(dot)org |
Subject: | TPC-H Scaling Factors X PostgreSQL Cluster Command |
Date: | 2007-04-21 14:54:42 |
Message-ID: | d34b24380704210754q53aa86c8l3a63211d81febbf7@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Hello everyone,
This is my first post in here, i am in need of some help...
Wel, i am running PostgreSQL 8.2.4, in a single node, in this machine:
Dell PowerEdge 1950
Intel Xeon 2.33 (2 CPUs)
4GB RAM
Sata HD 160GB
Debian Distribution
Some relevant parameters:
In Linux:
-> SHMMAX: 32MB
In postgresql.conf:
-> shared_buffers: 24MB
-> maintenance_work_mem: 16MB
I need to: build, load and do some optimization procedures in a TPC-H
benchmark database.
So far, i need to do it in three different scale factors (1, 2 and 5GB
databases).
My build process comprehends creating the tables without any foreign keys,
indexes, etc. - Running OK!
Then, i load the data from the flat files generated through DBGEN software
into these tables. - Running OK!
Finally, i run a "optimize" script that does the following:
- Alter the tables to add the mandatory foreign keys;
- Create all mandatory indexes;
- Cluster the orders table by the orders table index;
- Cluster the lineitem table by the lineitem table index;
- Vacuum the database;
- Analyze statistics.
This is the step which is causing me some headaches, mainly related to the
5GB database. I identified that the cluster command over the lineitem table
(cluster idx_lineitem on lineitem) is the responsible. I got to this
conclusion because when i run it in the 1GB and 2GB database i am able to
complete this script in 10 and 30 minutes each. But when i run this command
over the 5GB database, it simply seems as it won't end. I watched it running
over 12 hours and nothing happened.
To investigate a bit, i tried to tune these parameters and these parameters
only, and re-run the script (rebooted the machine and restarted it all
over):
In Linux: SHMMAX -> Tuned it to 2GB via echo "..." > /proc/sys/kernel/shmmax
In postgresql.conf:
shared_buffers: 512MB
maintenance_work_mem: 800MB
I thought that this might improve the performance, but as a matter of fact,
that's what happened:
1 GB database - cluster command time remains the same (more or less 10
minutes)
2 GB database - cluster command now takes 3 hours instead of 30 minutes! BAD
5 GB database - still can't complete the command in over 12 hours.
To add some info, i did a top command on the machine, i saw that the
postmaster consumes all the "shared_buffers" configured in the physical
memory (13,3% of the RAM --> 512MB of 4GB) but the total free mem is 0% (all
the 4GB is used, but not by the postmaster), and no swap is in use, CPU is
around 1% busy (this 1% is for the postmaster), and this machine is
dedicate, for my personal use, and there's nothing else running but
PostgreSQL.
Does anyone have any clues?
Thanks in advance,
Nelson P Kotowski Filho.
From | Date | Subject | |
---|---|---|---|
Next Message | Colin McGuigan | 2007-04-21 15:33:38 | Odd problem with planner choosing seq scan |
Previous Message | cluster | 2007-04-21 10:43:27 | Re: FK triggers misused? |