Re: how good is PostgreSQL

From: "Steve Wolfe" <steve(at)iboats(dot)com>
To: <pgsql-general(at)postgresql(dot)org>
Subject: Re: how good is PostgreSQL
Date: 2000-10-31 20:30:54
Message-ID: 003501c04379$78859ca0$50824e40@iboats.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers


> Even after that, you have a long way to go before you will hit 1000
> transactions per second from any SQL database.

Since my last post probably wasn't too useful, here's some information
that might be a little more help. It's a little long, I know, but hopefully
it will be of use to someone.

As programmers, we naturally want to throw things into databases for
three reasons. First, it's easy to get data in. Second, it's easy to get
relevant data out. And third, it's "cool". We don't want to work with flat
files, now do we? ; )

However, in some cases, using the database to get data out ends up
costing us a lot of time and money. Sometimes we do the same nasty query so
often, that we end up purchasing bigger hardware to make the system work
reasonably. Why? Because it was easier for us to write a program that did:

GetDataFromDatabase();
PrepareData();
PrintData();

Each time, the database server does the work. But it doesn't
necessarily have to be that way. In our company, we've found two trends
that have enabled us to save a LOT of processing power on our machines.
(read: Increase the capacity of our servers by 30% or more, with fairly
minor changes)

The first case is that of rarely-changing data. Some of our datasets
probably have around 50,000 to 1,000,000 views (selects) for each update
(insert/delete). Having the database repeat the query every time is a
waste. So, we began writing our programs such that they will grab the data
from the database once, and generate the HTML for every page, and the
indexes. Then, when an update is made to the database (via the
administrative tools), it simply rewrites *the relevant HTML files*, and
changes the indeces pointing to them. (There are also some other very large
advantages to this sort of thing, but I'm not allowed to say them. ; ) )

The second case is that of often-repeated queries. One of the
offerings on our site is an online directory, which gets a pretty fair
amount of traffic. Unfortunately, it uses a proprietary program that was
purchased by management before they spoke with us. Grr.... It was the
most utterly inefficient program I've ever seen. It would *not* allow the
database to do joins, it would grab entire tables, then try to do the joins
itself, in Perl.

We rewrote the program to let PostgreSQL do the joins, and that sped
it up. Then we realized that a very small number of queries (those for the
first one or two levels of pages) accounted for a huge portion of the
useage. So, we replaced the front page with a static HTML page (the front
page doesn't change...), and saw another terrific drop in our system loads.

Overall, by only modifying a couple of our more heavily-uesd programs,
our server loads dropped by about 30%-40%. If we went to the trouble to
modify some others, it would drop even more. But we're going to rewrite
them completely for other reasons. : )

In any event, there are ways like this to save a LOT of CPU and disk I/O.
Most web servers can server out several hundred static pages with the
resources that would otherwise deliver one dynamically-created,
database-driven page. It also allows you to cluster the web servers with
cheap commodity hardware, instead of using big-iron on the database. And if
you have a big-iron machine running the back-end, this can severely lighten
the load on it, keeping you from dropping a few hundred grand on the next
step up. ; )

(Incidentally, we've toyed around with developping a query-caching system
that would sit betwen PostgreSQL and our DB libraries. However, it seems
like it could be done *much* more efficiently in PostgreSQL itself, as it
would be much easier to keep track of which tables have changed, etc..
Anybody know if this sort of functionality is planned? It would be terrific
to simply give the machine another 256 megs of RAM, and tell it to use it as
a DB cache...)

steve

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message markw 2000-10-31 20:45:14 Re: True ACID under linux (no fsync)?
Previous Message bmccoy 2000-10-31 20:22:17 Re: True ACID under linux (no fsync)?

Browse pgsql-hackers by date

  From Date Subject
Next Message Jones, Colin 2000-10-31 20:35:55 RE: Restricting permissions on Unix socket
Previous Message Steve Wolfe 2000-10-31 20:18:49 Re: how good is PostgreSQL