Re: Hard Drive Usage for Speeding up Big Queries

From: "Merlin Moncure" <mmoncure(at)gmail(dot)com>
To: "Alex Hochberger" <alex(at)dsgi(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Hard Drive Usage for Speeding up Big Queries
Date: 2008-01-29 02:56:40
Message-ID: b42b73150801281856q2970bbf1mc0065dddfa625933@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Jan 28, 2008 8:54 AM, Alex Hochberger <alex(at)dsgi(dot)us> wrote:
> We are trying to optimize our Database server without spending a
> fortune on hardware. Here is our current setup.
>
> Main Drive array: 8x 750 GB SATA 2 Drives in a RAID 10 Configuration,
> this stores the OS, Applications, and PostgreSQL Data drives. 3 TB
> Array, 2 TB Parition for PostgreSQL.
> Secondary drive array: 2x 36 GB SAS 15000 RPM Drives in a RAID 1
> Configuration: the pg_xlog directory, checkpoints set to use about 18
> GB max, this way when massive numbers of small writes occur, they
> don't slow the system down. Drive failure loses no data. Checkpoints
> will be another matter, hope to keep under control with bgwriter
> tweaking.
>
> Now our "normal" activities are really fast. Scanning data, etc., all
> runs pretty quickly. What is NOT fast is some of the massive
> queries. We have some Updates with joins that match a 100m line table
> with a 200m line table. Outside of custom coding pl/pgsql code that
> creates the subfunction on the fly (which is up for consideration) to
> try to keep the Matching joins to an O(n) problem from the current
> O(n^2) one, we are looking at hardware as an option to help speed up
> these big batch queries that sometimes run for 5-6 days.
>

Well, you have already put some thought into your hardware...but the
awful truth is that the sata drives are just terrible at seeking once
you start seeing significant numbers of page faults to disk, and
getting killed on sorting on top of it. Maybe the best plan of attack
here is to post some explain times, and the relevant query. Perhaps
there are some optimizations in indexing strategies or other query
tactics (the pl/pgsql function smells suspicious as you have already
noted), and hopefully the giant sort can be optimized out. You have a
nasty problem that may require some out of the box thinking, so the
more information you can provide the better.

merlin

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Claus Guttesen 2008-01-29 08:06:49 Re: 8x2.5" or 6x3.5" disks
Previous Message Tom Lane 2008-01-28 20:51:53 Re: JDBC/Stored procedure performance issue