Skip site navigation (1) Skip section navigation (2)

Re: New server to improve performance on our large and busy DB - advice?

From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Carlo Stonebanks <stonec(dot)register(at)sympatico(dot)ca>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: New server to improve performance on our large and busy DB - advice?
Date: 2010-01-14 19:54:09
Message-ID: 924d96f74a9902a1c5f275730b8a1fc1@commandprompt.com (view raw or flat)
Thread:
Lists: pgsql-performance
On Thu, 14 Jan 2010 14:17:13 -0500, "Carlo Stonebanks"
<stonec(dot)register(at)sympatico(dot)ca> wrote:
> My client just informed me that new hardware is available for our DB
> server.
> 
> . Intel Core 2 Quads Quad
> . 48 GB RAM
> . 4 Disk RAID drive (RAID level TBD)
> 
> I have put the ugly details of what we do with our DB below, as well as
> the 
> postgres.conf settings. But, to summarize: we have a PostgreSQL 8.3.6 DB

> with very large tables and the server is always busy serving a constant 
> stream of single-row UPDATEs and INSERTs from parallel automated
processes.
> 
> There are less than 10 users, as the server is devoted to the KB
> production 
> system.
> 
> My questions:
> 
> 1) Which RAID level would you recommend

10

> 2) Which Windows OS would you recommend? (currently 2008 x64 Server)

If you have to run Windows... that works.

> 3) If we were to port to a *NIX flavour, which would you recommend?
(which 
> support trouble-free PG builds/makes please!)

Community driven:
Debian Stable
CentOS 5

Commercial:
Ubuntu LTS
RHEL 5

> 4) Is this the right PG version for our needs?

You want to run at least the latest stable 8.3 series which I believe is
8.3.9.
With the imminent release of 8.5 (6 months), it may be time to move to
8.4.2 instead.


Joshua D. Drake


> 
> Thanks,
> 
> Carlo
> 
> The details of our use:
> 
> . The DB hosts is a data warehouse and a knowledgebase (KB) tracking the

> professional information of 1.3M individuals.
> . The KB tables related to these 130M individuals are naturally also
large
> . The DB is in a perpetual state of serving TCL-scripted Extract,
> Transform 
> and Load (ETL) processes
> . These ETL processes typically run 10 at-a-time (i.e. in parallel)
> . We would like to run more, but the server appears to be the bottleneck
> . The ETL write processes are 99% single row UPDATEs or INSERTs.
> . There are few, if any DELETEs
> . The ETL source data are "import tables"
> . The import tables are permanently kept in the data warehouse so that
we 
> can trace the original source of any information.
> . There are 6000+ and counting
> . The import tables number from dozens to hundreds of thousands of rows.

> They rarely require more than a pkey index.
> . Linking the KB to the source import date requires an "audit table" of
> 500M 
> rows, and counting.
> . The size of the audit table makes it very difficult to manage,
> especially 
> if we need to modify the design.
> . Because we query the audit table different ways to audit the ETL
> processes 
> decisions, almost every column in the audit table is indexed.
> . The maximum number of physical users is 10 and these users RARELY
> perform 
> any kind of write
> . By contrast, the 10+ ETL processes are writing constantly
> . We find that internal stats drift, for whatever reason, causing row
seq 
> scans instead of index scans.
> . So far, we have never seen a situation where a seq scan has improved 
> performance, which I would attribute to the size of the tables
> . We believe our requirements are exceptional, and we would benefit 
> immensely from setting up the PG planner to always favour index-oriented

> decisions - which seems to contradict everything that PG advice suggests
> as 
> best practice.
> 
> Current non-default conf settings are:
> 
> autovacuum = on
> autovacuum_analyze_scale_factor = 0.1
> autovacuum_analyze_threshold = 250
> autovacuum_naptime = 1min
> autovacuum_vacuum_scale_factor = 0.2
> autovacuum_vacuum_threshold = 500
> bgwriter_lru_maxpages = 100
> checkpoint_segments = 64
> checkpoint_warning = 290
> datestyle = 'iso, mdy'
> default_text_search_config = 'pg_catalog.english'
> lc_messages = 'C'
> lc_monetary = 'C'
> lc_numeric = 'C'
> lc_time = 'C'
> log_destination = 'stderr'
> log_line_prefix = '%t '
> logging_collector = on
> maintenance_work_mem = 16MB
> max_connections = 200
> max_fsm_pages = 204800
> max_locks_per_transaction = 128
> port = 5432
> shared_buffers = 500MB
> vacuum_cost_delay = 100
> work_mem = 512MB

-- 
PostgreSQL - XMPP: jdrake(at)jabber(dot)postgresql(dot)org
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997

In response to

pgsql-performance by date

Next:From: Fernando HeviaDate: 2010-01-14 20:03:52
Subject: new server I/O setup
Previous:From: Carlo StonebanksDate: 2010-01-14 19:20:30
Subject: Re: Massive table (500M rows) update nightmare

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group