Re: 4G row table?

From: "Josh Berkus" <josh(at)agliodbs(dot)com>
To: "Charles H(dot) Woloszynski" <chw(at)clearmetrix(dot)com>, josh(at)agliodbs(dot)com
Cc: gry(at)ll(dot)mit(dot)edu, pgsql-performance(at)postgresql(dot)org
Subject: Re: 4G row table?
Date: 2002-12-20 17:01:28
Message-ID: web-2293056@davinci.ethosmedia.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Charlie,

> Why do you say to expect slow performance on this hardware? Is
> there something specific about the configuration that worries you?
> Or, just lots of data in the database, so the data will be on disk
> and not in the cache (system or postgresql)?
> What do you classify as *slow*? Obviously, he is dependent on the
> I/O channel given the size of the tables. So, good indexing will be
> required to help on the queries. No comments on the commit rate for
> this data (I am guessing that it is slow, given the description of
> the database), so that may or may not be an issue.
> Depending on the type of queries, perhaps clustering will help, along
> with some good partitioning indexes.
> I just don't see the slow in the hardware. Of course, if he is
> targeting lots of concurrent queries, better add some additional
> processors, or better yet, use ERSERVER and replicate the data to a
> farm of machines. [To avoid the I/O bottleneck of lots of concurrent
> queries against these large tables].
>
> I guess there are a lot of assumptions on the data's use to decide if
> the hardware is adequate or not :-)

Well, slow is relative. It may be fast enough for him. Me, I'd be
screaming in frustration.

Take, for example, an index scan on the primary key. Assuming that he
can get the primary key down to 12 bytes per node using custom data
types, that's still:

12bytes * 4,000,000,000 rows = 48 GB for the index

As you can see, it's utterly impossible for him to load even the
primary key index into his 512 MB of RAM (of which no more than 200mb
can go to Postgres anyway without risking conflicts over RAM). A
Sort-and-Limit on the primary key, for example, would require swapping
the index from RAM to swap space as much as 480 times! (though probably
more like 100 times on average)

With a slow RAID array and the hardware he described to us, this would
mean, most likely, that a simple sort-and-limit on primary key query
could take hours to execute. Even with really fast disk access, we're
talking tens of minutes at least.

-Josh Berkus

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Noah Silverman 2002-12-20 22:57:28 Speed Question
Previous Message Manfred Koizar 2002-12-20 10:27:18 Re: 4G row table?