Re: What popular, large commercial websites run

From: Scott Marlowe <scott(dot)marlowe(at)ihs(dot)com>
To: <pgsql-general(at)postgresql(dot)org>
Subject: Re: What popular, large commercial websites run
Date: 2002-05-02 20:59:54
Message-ID: Pine.LNX.4.33.0205021452250.5658-100000@css120.ihs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I just tested this, and it's true. On my 7.2.1 machine, with booles,
updating 50% of the 100,000 test rows I made, the data usage climbs about
1 Meg, and most of it does not get recovered by vacuum. I'll compile
7.2.1 (I'm running 7.2) and test it for other datatypes as well.

On Thu, 2 May 2002, Shaun Thomas wrote:

> On Wed, 1 May 2002 postgres(at)vrane(dot)com wrote:
>
> > I'm very curious to know why you have problem with growing
> > database. Does the performance suffer significantly
> > if you don't do the FULL vacuum? Surely if you can
> > afford the oracle you can afford relatively much
> > cheaper storage. You must have other reasons
> > than just not liking large database
>
> Well, it's not the fact that it's growing that's the problem. It's the
> fact that 100 actual MB of frequently changed data becomes 2gigs if not
> frequently vacuumed. Even with hourly full vacuums, it still slowly
> bloats to 200mb in two weeks, with the same amount of data. The worst
> part about this is that the more it bloats, the longer vacuum takes,
> and the speed of the bloating increases almost exponentially. Given
> two months, it's back up to 2 gigs.
>
> Full dump and restore? Back down to 100MB. I'm sorry, but no amount of
> disk storage should have to compensate for a database growing to 60x
> larger than the actual data stored (the data dump is 30 megs, but you
> can expect a certain amount of bloat due to column sizes). When the
> data files start hitting 2GB each, the Linux file-size limit comes into
> play, and you have no choice but to dump and restore.
>
> Even worse? Since there is only about 100 MB of real data in that 2GB
> morass, there's 1.9GB of old or invalid rows that Vacuum didn't clean
> up. That makes all subsequent vacuums slower, which makes their locks
> last longer, which means all selects on the tables being vacuumed are
> stalled until the vacuum is done. What happens when the vacuum takes
> half an hour, and it's a web application? Oh darn, you stop serving
> pages that use the database.
>
> Only full vacuum on non-peak times, you say? We tried that. The
> datafiles exploded to 2GB within days. *DAYS* The hourly vacuum
> brought it under control, but our insert script which runs every hour
> and replaces about 10% of the data per run, really cries bloody murder
> while the vacuum is running. As a result, this machine commonly has a
> load approaching 20 almost all the time. Turn off postgres? Less than
> 1, even with a concurrent Mysql DB that has a *valid* 2GB database that
> contains 2GB of actual data.
>
> I'm not passing blame. I'm not even angry. I'm just frustrated and
> tired of babying Postgres so it doesn't cause the server to burst into
> flames, crash, or otherwise fail. I actually had less admin overhead
> with an Oracle database. Anyone who has used Oracle knows just how hard
> it is to administer, but I sure as hell never had to completely dump and
> restore the data every month to keep it from eating my drives. I mean,
> what good is MVCC to avoid locking, when frequent full vacuums are even
> worse? I'd rather have a little locking contention, than have a
> completely useless database for ten to twenty minutes, every hour.
>
> Heck, maybe it's just our special case, that such a high rate of data
> turnover just bewilders postgres. But as of now, it's just plain
> unusable. Until vacuum goes away completely, which it never will as
> long as MVCC is in place, postgres is not an option for us. I just
> can't see any way around it. I hate mysql when I want things like
> foreign keys or subselects, so we still need a full DBMS. Hell, we're
> even considering giving a look to freaking Interbase, for the love of
> God. We're desperate, here. ^_^
>
> As a side note, Sybase used to be our DBMS of choice, but it didn't play
> nice with PHP (segfaults on connects, sometimes), and only old versions
> fall under the free license, so we ditched it too. It also had weird
> syntax (go, for crying out loud!?), so our developers hated it.
>
> If I could only take all the good things from the databases I liked, and
> make them one database... Ease of administration of Mysql + Features of
> Oracle, for instance. But that database is only available in the
> wonderful, magical world of 'you must be kidding' land. Oh well.
>
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Jean-Michel POURE 2002-05-02 21:10:39 Re: PL/j - java stored procedures
Previous Message Scott Marlowe 2002-05-02 20:47:42 Re: FATAL: stuck spinlock