Quick Links

Re: New server to improve performance on our large and busy DB - advice? (v2)

From:	Dave Crooke <dcrooke(at)gmail(dot)com>
To:	Carlo Stonebanks <stonec(dot)register(at)sympatico(dot)ca>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: New server to improve performance on our large and busy DB - advice? (v2)
Date:	2010-01-14 22:35:53
Message-ID:	ca24673e1001141435w19fd4930o2caea62087a4af50@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

I'll bite ....

1. In general, RAID-10 is the only suitable RAID configuration for a
database. The decision making comes in how many drives, and splitting stuff
up into LUNs (like putting pg_xlog on its own LUN).

2. None of the above - you're asking the wrong question really. PostgreSQL
is open source, and is developed on Unix. The Windows version is a pretty
good port, as Windows posrt of OSS stuff go, but it's just that, a port.
Your server is being dedicated to running Postgres, so the right question to
ask is "What is the best OS for running Postgres?".

For any given database engine, regardless of the marketing and support
stance, there is only one true "primary" enterprise OS platform that most
big mission critical sites use, and is the best supported and most stable
platform for that RDBMS. For Oracle, that's HP-UX (but 10 years ago, it was
Solaris). For PostgreSQL, it's Linux.

The biggest problem with Postgres on Windows is that it only comes in
32-bit. RAM is the ultimate performance tweak for an RDBMS, and to make
proper use of modern amounts of RAM, you need a 64-bit executable.

3. The two choices I'd consider are both Linux:

- for the conservative / supported approach, get Red Hat and buy support
from them and (e.g.) Enterprise DB
- if you plan to keep pretty current and are happy actively managing
versions and running locally compiled builds, go with Ubuntu

4. The general wisdom is that there are a lot of improvements from 8.3 to
8.4, but how much benefit you'll see in your environment is another
question. If you're building a new system and have to migrate anyway, it
seems like a good opportunity to upgrade.

Cheers
Dave

On Thu, Jan 14, 2010 at 3:25 PM, Carlo Stonebanks <
stonec(dot)register(at)sympatico(dot)ca> wrote:

> My client just informed me that new hardware is available for our DB
> server.
>
> . Intel Core 2 Quads Quad
> . 48 GB RAM
> . 4 Disk RAID drive (RAID level TBD)
>
> I have put the ugly details of what we do with our DB below, as well as the
> postgres.conf settings. But, to summarize: we have a PostgreSQL 8.3.6 DB
> with very large tables and the server is always busy serving a constant
> stream of single-row UPDATEs and INSERTs from parallel automated processes.
>
> There are less than 10 users, as the server is devoted to the KB production
> system.
>
> My questions:
>
> 1) Which RAID level would you recommend
> 2) Which Windows OS would you recommend? (currently 2008 x64 Server)
> 3) If we were to port to a *NIX flavour, which would you recommend? (which
> support trouble-free PG builds/makes please!)
> 4) Is this the right PG version for our needs?
>
> Thanks,
>
> Carlo
>
> The details of our use:
>
> . The DB hosts is a data warehouse and a knowledgebase (KB) tracking the
> professional information of 1.3M individuals.
> . The KB tables related to these 130M individuals are naturally also large
> . The DB is in a perpetual state of serving TCL-scripted Extract, Transform
> and Load (ETL) processes
> . These ETL processes typically run 10 at-a-time (i.e. in parallel)
> . We would like to run more, but the server appears to be the bottleneck
> . The ETL write processes are 99% single row UPDATEs or INSERTs.
> . There are few, if any DELETEs
> . The ETL source data are "import tables"
> . The import tables are permanently kept in the data warehouse so that we
> can trace the original source of any information.
> . There are 6000+ and counting
> . The import tables number from dozens to hundreds of thousands of rows.
> They rarely require more than a pkey index.
> . Linking the KB to the source import date requires an "audit table" of
> 500M
> rows, and counting.
> . The size of the audit table makes it very difficult to manage, especially
> if we need to modify the design.
> . Because we query the audit table different ways to audit the ETL
> processes
> decisions, almost every column in the audit table is indexed.
> . The maximum number of physical users is 10 and these users RARELY perform
> any kind of write
> . By contrast, the 10+ ETL processes are writing constantly
> . We find that internal stats drift, for whatever reason, causing row seq
> scans instead of index scans.
> . So far, we have never seen a situation where a seq scan has improved
> performance, which I would attribute to the size of the tables
> . We believe our requirements are exceptional, and we would benefit
> immensely from setting up the PG planner to always favour index-oriented
> decisions - which seems to contradict everything that PG advice suggests as
> best practice.
>
> Current non-default conf settings are:
>
> autovacuum = on
> autovacuum_analyze_scale_factor = 0.1
> autovacuum_analyze_threshold = 250
> autovacuum_naptime = 1min
> autovacuum_vacuum_scale_factor = 0.2
> autovacuum_vacuum_threshold = 500
> bgwriter_lru_maxpages = 100
> checkpoint_segments = 64
> checkpoint_warning = 290
> datestyle = 'iso, mdy'
> default_text_search_config = 'pg_catalog.english'
> lc_messages = 'C'
> lc_monetary = 'C'
> lc_numeric = 'C'
> lc_time = 'C'
> log_destination = 'stderr'
> log_line_prefix = '%t '
> logging_collector = on
> maintenance_work_mem = 16MB
> max_connections = 200
> max_fsm_pages = 204800
> max_locks_per_transaction = 128
> port = 5432
> shared_buffers = 500MB
> vacuum_cost_delay = 100
> work_mem = 512MB
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

In response to

New server to improve performance on our large and busy DB - advice? (v2) at 2010-01-14 21:25:00 from Carlo Stonebanks

Responses

Re: New server to improve performance on our large and busy DB - advice? (v2) at 2010-01-15 00:46:09 from Craig Ringer
Re: New server to improve performance on our large and busy DB - advice? (v2) at 2010-01-15 16:10:40 from Tony McC

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Pierre Frédéric Caillaud	2010-01-14 23:19:56	Re: Inserting 8MB bytea: just 25% of disk perf used?
Previous Message	Carlo Stonebanks	2010-01-14 22:05:48	Re: Massive table (500M rows) update nightmare