Re: What popular, large commercial websites run

From: Shaun Thomas <sthomas(at)townnews(dot)com>
To: postgres(at)vrane(dot)com
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: What popular, large commercial websites run
Date: 2002-05-02 14:54:12
Message-ID: Pine.LNX.4.44.0205020920250.16874-100000@hamster.lee.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, 1 May 2002 postgres(at)vrane(dot)com wrote:

> I'm very curious to know why you have problem with growing
> database. Does the performance suffer significantly
> if you don't do the FULL vacuum? Surely if you can
> afford the oracle you can afford relatively much
> cheaper storage. You must have other reasons
> than just not liking large database

Well, it's not the fact that it's growing that's the problem. It's the
fact that 100 actual MB of frequently changed data becomes 2gigs if not
frequently vacuumed. Even with hourly full vacuums, it still slowly
bloats to 200mb in two weeks, with the same amount of data. The worst
part about this is that the more it bloats, the longer vacuum takes,
and the speed of the bloating increases almost exponentially. Given
two months, it's back up to 2 gigs.

Full dump and restore? Back down to 100MB. I'm sorry, but no amount of
disk storage should have to compensate for a database growing to 60x
larger than the actual data stored (the data dump is 30 megs, but you
can expect a certain amount of bloat due to column sizes). When the
data files start hitting 2GB each, the Linux file-size limit comes into
play, and you have no choice but to dump and restore.

Even worse? Since there is only about 100 MB of real data in that 2GB
morass, there's 1.9GB of old or invalid rows that Vacuum didn't clean
up. That makes all subsequent vacuums slower, which makes their locks
last longer, which means all selects on the tables being vacuumed are
stalled until the vacuum is done. What happens when the vacuum takes
half an hour, and it's a web application? Oh darn, you stop serving
pages that use the database.

Only full vacuum on non-peak times, you say? We tried that. The
datafiles exploded to 2GB within days. *DAYS* The hourly vacuum
brought it under control, but our insert script which runs every hour
and replaces about 10% of the data per run, really cries bloody murder
while the vacuum is running. As a result, this machine commonly has a
load approaching 20 almost all the time. Turn off postgres? Less than
1, even with a concurrent Mysql DB that has a *valid* 2GB database that
contains 2GB of actual data.

I'm not passing blame. I'm not even angry. I'm just frustrated and
tired of babying Postgres so it doesn't cause the server to burst into
flames, crash, or otherwise fail. I actually had less admin overhead
with an Oracle database. Anyone who has used Oracle knows just how hard
it is to administer, but I sure as hell never had to completely dump and
restore the data every month to keep it from eating my drives. I mean,
what good is MVCC to avoid locking, when frequent full vacuums are even
worse? I'd rather have a little locking contention, than have a
completely useless database for ten to twenty minutes, every hour.

Heck, maybe it's just our special case, that such a high rate of data
turnover just bewilders postgres. But as of now, it's just plain
unusable. Until vacuum goes away completely, which it never will as
long as MVCC is in place, postgres is not an option for us. I just
can't see any way around it. I hate mysql when I want things like
foreign keys or subselects, so we still need a full DBMS. Hell, we're
even considering giving a look to freaking Interbase, for the love of
God. We're desperate, here. ^_^

As a side note, Sybase used to be our DBMS of choice, but it didn't play
nice with PHP (segfaults on connects, sometimes), and only old versions
fall under the free license, so we ditched it too. It also had weird
syntax (go, for crying out loud!?), so our developers hated it.

If I could only take all the good things from the databases I liked, and
make them one database... Ease of administration of Mysql + Features of
Oracle, for instance. But that database is only available in the
wonderful, magical world of 'you must be kidding' land. Oh well.

--
+-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+
| Shaun M. Thomas INN Database Administrator |
| Phone: (309) 743-0812 Fax : (309) 743-0830 |
| Email: sthomas(at)townnews(dot)com AIM : trifthen |
| Web : www.townnews.com |
| |
| "Most of our lives are about proving something, either to |
| ourselves or to someone else." |
| -- Anonymous |
+-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Shaun Thomas 2002-05-02 14:57:21 Re: What popular, large commercial websites run
Previous Message Joel Burton 2002-05-02 14:43:25 Re: is there a way