What I do with PostgreSQL

From: alex avriette <a_avriette(at)acs(dot)org>
To: <pgsql-hackers(at)postgresql(dot)org>
Subject: What I do with PostgreSQL
Date: 2001-07-16 18:48:56
Message-ID: B778AF56.75B%a_avriette@acs.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

This might not be the correct list to send this to, but none of the other
lists seemed appropriate. A friend of mine who uses postgres extensively at
his job suggested I might send y'all a note outlining what we do with it
here.

In general, I am discouraged from providing specific data to non-employees
about what we do. But Dan (the aforementioned friend) said that you guys
would be interested in knowing what I am currently doing with postgres, so
that you know that its up to the challenges we don¹t often get to put
hardware and software to.

I am working in the publications division of the American Chemical Society.
We are in the process of taking all of our 30+ journals from the last 150 or
so years and digitizing them. This process entails scanning over 2.5 million
pages (though this is really only a rough estimate. It could be much higher)
and digitizing them. Our output is in several formats. First, we have the
input TIFF (from the scans), we have PDF's which we render using Adobe
Capture, XML (which we pay a vendor for), and a proprietary format called
DjVu which is kind of.... Well, its like metadata. Initially, we were using
perl scripts and shell scripts to traverse the entire filesystem looking for
files.

This got rather difficult and was time consuming. My suggestion was to just
use a database for keeping track of stuff. We have something like 27
different instances of oracle running here on 4 or 5 different machines. I
don't know much about our oracle stuff. My solution was to just go download
and install postgres.

Our hardware is a cluster of 3 ultra 10's, a pair of 700-dvd jukeboxes (with
burners), a 2.5tb SAN, 10 DAT tape readers, a pair of dvd-roms, and 2 200gb
disk packs (one for each of our tape-reading suns -- the other one manages
the DVD jukes). We also run capture on four dell poweredge servers running
NT. We run the DjVu software on an additional 3 poweredge servers. That
stuff is NT. The SAN is run on a cluster of 4 sun e 3500's.

I am pumping about 200gb a week through the pg database, and our estimated
database size is something like 4tb by the end of the year.

We populate the database with perl scripts. The sun that runs the dvd jukes
is also our database server. We have shell scripts that look over our data
on the disk, and we use sun's NFS to keep disks between the suns and some
funky Sun smb-esque software to keep disks mounted on the nt boxes.

And that's just the "large" database. I have an additional database that I
am using to store the textual data we receive in the form of
"crystallography information files" (http://www.iucr.org/) which are roughly
6,000 lines long. I have 10,000 of them stored at the moment in the
database, going back to about 1996. As you can tell, this database is going
to get much bigger. At the moment it's living on an Ultra 2 in a 2gb
partition.

In some ways, I am amazed that postgres has stood up to the challenge. In
others, however, I am not in the least surprised. Its a fantastic piece of
software that requires almost no intervention on my part. I talked to one of
our oracle dba's about it. He actually (im not kidding here) did not believe
it could be a database if it did not require maintenance.

I am very happy with postgres and I am glad to provide information about our
setup if you'd like to know anything else.

If you'd like to quote me on the environment if youre interested in putting
something in a FAQ (i.e., "can postgres scale up to > tb scale?"), that¹s
fine as well, but I would like to make sure that it doesn¹t point to ACS and
is not too specific.

Anyhow, thanks for your hard work guys/gals.

alex

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2001-07-16 19:02:13 Re: ALTER TABLE ADD COLUMN column SERIAL -- unexpected results
Previous Message Víctor Romero 2001-07-16 18:10:43 Re: [HACKERS] Translators wanted