Re: Large tables (was: RAID 0 not as fast as

From: "Luke Lonergan" <llonergan(at)greenplum(dot)com>
To: "Bucky Jordan" <bjordan(at)lumeta(dot)com>, "PostgreSQL Performance List" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Large tables (was: RAID 0 not as fast as
Date: 2006-09-21 23:31:49
Message-ID: C1386EF5.24A0%llonergan@greenplum.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Bucky,

On 9/21/06 2:16 PM, "Bucky Jordan" <bjordan(at)lumeta(dot)com> wrote:

> Does this have anything to do with postgres indexes not storing data, as
> some previous posts to this list have mentioned? (In otherwords, having
> the index in memory doesn't help? Or are we talking about indexes that
> are too large to fit in RAM?)

Yes, if the index could be scanned without needing to scan the heap to
satisfy a query, that query would benefit from sequential access. This is
true whether the index fits in RAM or not.

> So this issue would be only on a per query basis? Could it be alleviated
> somewhat if I ran multiple smaller queries? For example, I want to
> calculate a summary table on 500m records- fire off 5 queries that count
> 100m records each and update the summary table, leaving MVCC to handle
> update contention?

Clever, functional and very painful way to do it, but yes, you would get 5
disks worth of seeking.

My goal is to provide for as many disks seeking at the same time as are
available to the RAID. Note that the Sun Appliance (X4500 based) has 11
disk drives available per CPU core. Later it will drop to 5-6 disks per
core with the introduction of quad core CPUs, which is more the norm for
now. Bizgres MPP will achieve one or two concurrent heap scanner per CPU
for a given query in the default configurations, so we're missing out on
lots of potential speedup for index scans in many cases.

With both MPP and stock Postgres you get more seek rate as you add users,
but it would take 44 users to use all of the drives in random seeking for
Postgres, where for MPP it would take more like 5.

> Actually, now that I think about it- that would only work if the
> sections I mentioned above were on different disks right? So I would
> actually have to do table partitioning with tablespaces on different
> spindles to get that to be beneficial? (which is basically not feasible
> with RAID, since I don't get to pick what disks the data goes on...)

On average, for random seeking we can assume that RAID will distribute the
data evenly. The I/Os should balance out.

- Luke

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Guy Thornley 2006-09-22 02:52:09 Re: Large tables (was: RAID 0 not as fast as
Previous Message Arjen van der Meijden 2006-09-21 22:26:15 Re: PostgreSQL and sql-bench