Re: Realistic upper bounds on DB size

From: Guy Fraser <guy(at)incentre(dot)net>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Realistic upper bounds on DB size
Date: 2003-03-28 21:40:21
Message-ID: 3E84C145.30801@incentre.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I have just built a Athalon 1800 Machine with a 60GB UDMA100 drive and
512MB Ram.

I imported 15 monthes worth of dialup accounting information {7.5
Million rows ~ 1GB}.
Using the "copy" command I was able to import the data into a predefined
table in 21 minutes.
I did this as a test to see what the performance would be like. I will
now split the data into
seperate tables based on rercord types and possibly further into tables
per year-month and create monthly and daily summary tables to improve
performance for common queries. Of course data from multiple months can
be accessed using a "select ... from (select ... union select ...) as
full ..." type selection for larger data sets when required.

Having a little but not extensive experience with astronomics the data
should be able to be broken down into tables containing smaller areas
{quadrants IIRC} and a table mapping the quadrants to the tables could
be used by either a function or an external application to "union" the
tables for a larger data set as required. This would probably be the
most efficient way of dealing with HUGE data sets. If you have
exceptionaly fast hardware this may not be necessary for what you need.

This is how I am evolving the system I am working on, so that future
data requirements can be handled with out drasticly more expensive
hardware. The tables I will be using will be able to be "calculated" by
the time span required for the data set. I have a php front end used to
access the data, so I will implement a function that generates the query
for the data set required by the time span. This will allow common daily
and mothly summaries to be generated quickly and allow data for time
spans within a single month to be executed more efficiently.

You may have a lot more experience working with huge data sets than I
do, so If you can give me some suggestions I would appreciate it

Guy Fraser

A.M. wrote:

> I'm trying to figure out what the upper bounds on a postgresql table
> are based on required use of indices and integer columns in a single
> table.
> An astronomy institution I'm considering working for receives a
> monster amount of telescope data from a government observatory. Each
> day, they download millions of rows of data (including position in the
> sky, infrared reading, etc.) in CSV format. Most of the rows are
> floats and integers. I would like to offer them an improvement over
> their old system.
> I would like to know how postgresql does under such extreme
> circumstances- for example, I may load the entire millions of rows CSV
> file into a table and then eliminate some odd million rows they are
> not interested in. Would a vacuum at this time be prohibitively
> expensive? If I add some odd millions of rows to a table every day,
> can I expect the necessary indices to keep up? In other words, will
> postgresql be able to keep up with their simple and infrequent selects
> on monster amounts of data (potentially 15 GB/day moving in and out
> daily with db growth at ~+5 GB/day [millions of rows] in big blocks
> all at once) assuming that they have top-of-the-line equipment for
> this sort of thing (storage, memory, processors, etc.)? Anyone else
> using postgresql on heavy-duty astronomy data? Thanks for any info.
> ><><><><><><><><><
> AgentM
> agentm(at)cmu(dot)edu
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faqs/FAQ.html
>
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Steve Crawford 2003-03-28 21:40:47 Re: [NOVICE] Postgres Syslog
Previous Message Tom Lane 2003-03-28 20:48:12 Re: query taking too long