Re: Large tables (was: RAID 0 not as fast as expected)

From: "Bucky Jordan" <bjordan(at)lumeta(dot)com>
To: "Merlin Moncure" <mmoncure(at)gmail(dot)com>, "Luke Lonergan" <llonergan(at)greenplum(dot)com>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Large tables (was: RAID 0 not as fast as expected)
Date: 2006-09-18 22:40:51
Message-ID: 78ED28FACE63744386D68D8A9D1CF5D42099E1@MAIL.corp.lumeta.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

> good normalization skills are really important for large databases,
> along with materialization strategies for 'denormalized sets'.

Good points- thanks. I'm especially curious what others have done for
the materialization. The matview project on gborg appears dead, and I've
only found a smattering of references on google. My guess is, you roll
your own for optimal performance...

> regarding the number of rows, there is no limit to how much pg can
> handle per se, just some practical limitations, especially vacuum and
> reindex times. these are important because they are required to keep
> a handle on mvcc bloat and its very nice to be able to vaccum bits of
> your database at a time.

I was hoping for some actual numbers on "practical". Hardware isn't too
much of an issue (within reason- we're not talking an amazon or google
here... the SunFire X4500 looks interesting... )- if a customer wants to
store that much data, and pay for it, we'll figure out how to do it. I'd
just rather not have to re-design the database. Say the requirement is
to keep 12 months of data accessible, each "scan" produces 100M records,
and I run one per month. What happens if the customer wants to run it
once a week? I was more trying to figure out at what point (ballpark)
I'm going to have to look into archive tables and things of that nature
(or at Bizgres/MPP). It's easier for us to add more/bigger hardware, but
not so easy to redesign/add history tables...

>
> just another fyi, if you have a really big database, you can forget
> about doing pg_dump for backups (unless you really don't care about
> being x day or days behind)...you simply have to due some type of
> replication/failover strategy. i would start with pitr.
>
> merlin
I was originally thinking replication, but I did notice some nice pitr
features in 8.x - I'll have to look into that some more.

Thanks for the pointers though...

- Bucky

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Alex Turner 2006-09-18 23:14:56 Re: Large tables (was: RAID 0 not as fast as expected)
Previous Message Jim C. Nasby 2006-09-18 22:40:30 Re: Vacuums on large busy databases