My client just informed me that new hardware is available for our DB server.
. Intel Core 2 Quads Quad
. 48 GB RAM
. 4 Disk RAID drive (RAID level TBD)
I have put the ugly details of what we do with our DB below, as well as the
postgres.conf settings. But, to summarize: we have a PostgreSQL 8.3.6 DB
with very large tables and the server is always busy serving a constant
stream of single-row UPDATEs and INSERTs from parallel automated processes.
There are less than 10 users, as the server is devoted to the KB production
1) Which RAID level would you recommend
2) Which Windows OS would you recommend? (currently 2008 x64 Server)
3) If we were to port to a *NIX flavour, which would you recommend? (which
support trouble-free PG builds/makes please!)
4) Is this the right PG version for our needs?
The details of our use:
. The DB hosts is a data warehouse and a knowledgebase (KB) tracking the
professional information of 1.3M individuals.
. The KB tables related to these 130M individuals are naturally also large
. The DB is in a perpetual state of serving TCL-scripted Extract, Transform
and Load (ETL) processes
. These ETL processes typically run 10 at-a-time (i.e. in parallel)
. We would like to run more, but the server appears to be the bottleneck
. The ETL write processes are 99% single row UPDATEs or INSERTs.
. There are few, if any DELETEs
. The ETL source data are "import tables"
. The import tables are permanently kept in the data warehouse so that we
can trace the original source of any information.
. There are 6000+ and counting
. The import tables number from dozens to hundreds of thousands of rows.
They rarely require more than a pkey index.
. Linking the KB to the source import date requires an "audit table" of 500M
rows, and counting.
. The size of the audit table makes it very difficult to manage, especially
if we need to modify the design.
. Because we query the audit table different ways to audit the ETL processes
decisions, almost every column in the audit table is indexed.
. The maximum number of physical users is 10 and these users RARELY perform
any kind of write
. By contrast, the 10+ ETL processes are writing constantly
. We find that internal stats drift, for whatever reason, causing row seq
scans instead of index scans.
. So far, we have never seen a situation where a seq scan has improved
performance, which I would attribute to the size of the tables
. We believe our requirements are exceptional, and we would benefit
immensely from setting up the PG planner to always favour index-oriented
decisions - which seems to contradict everything that PG advice suggests as
Current non-default conf settings are:
autovacuum = on
autovacuum_analyze_scale_factor = 0.1
autovacuum_analyze_threshold = 250
autovacuum_naptime = 1min
autovacuum_vacuum_scale_factor = 0.2
autovacuum_vacuum_threshold = 500
bgwriter_lru_maxpages = 100
checkpoint_segments = 64
checkpoint_warning = 290
datestyle = 'iso, mdy'
default_text_search_config = 'pg_catalog.english'
lc_messages = 'C'
lc_monetary = 'C'
lc_numeric = 'C'
lc_time = 'C'
log_destination = 'stderr'
log_line_prefix = '%t '
logging_collector = on
maintenance_work_mem = 16MB
max_connections = 200
max_fsm_pages = 204800
max_locks_per_transaction = 128
port = 5432
shared_buffers = 500MB
vacuum_cost_delay = 100
work_mem = 512MB
In response to
pgsql-performance by date
|Next:||From: Carlo Stonebanks||Date: 2010-01-14 19:20:30|
|Subject: Re: Massive table (500M rows) update nightmare|
|Previous:||From: Greg Smith||Date: 2010-01-14 18:49:58|
|Subject: Re: Slow "Select count(*) ..." query on table with 60