Long-running performance (MVCC, Vacuum, etc.) - Any fix?

From: Eric Brown <eric(dot)brown(at)propel(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Long-running performance (MVCC, Vacuum, etc.) - Any fix?
Date: 2004-12-28 10:40:52
Message-ID: F26F359C-58BC-11D9-9956-000A95C7176C@propel.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I'm building an appliance where I don't want my customers having to
tune postgresql in any way from the configuration when I install it. I
don't even want them to know it is there. Yet in my study of posgresql,
it seems that even conscientious use of a vacuum daemon or cron job in
vacuum's various forms leaves databases after a while much larger and
lower performing than the actual space and performance after a
dump/restore.

It seems to me that with MVCC, an UPDATE (for example) is really a
DELETE+INSERT and it takes VACUUM to mark the old DELETED row as free
space again. So changing one boolean or increasing one counter in a row
of 100 bytes will relocate this row in a different location. While this
in itself isn't terrible as some DB page is dirty no matter what,
unless I'm lucky enough to insert into the same page that changed, my
index(es) for this table will have to be changed and this will dirty
yet another page that wouldn't be dirty without MVCC. And this is not
to mention the resources required by VACUUM. And the disk bloat over
time would imply that postgresql doesn't always efficiently use
free-space with records/rows of similar size and so performance
degrades as there are fewer rows per page over time.

My application is multi-threaded and I could care less about the
concurrency afforded by MVCC. I'd rather just update the things in
place and get less postgresql concurrency but more consistent
long-running performance and disk space utilization.

Is my interpretation correct?
Is there a way to turn off MVCC?
Do fixed sized rows help any?
Is there anybody using this thing in an appliance type application?
I'm quite far along with stored-procedures and whatnot, but if
postgresql really isn't the right solution due to these reasons, I'm
curious if anybody has alternate OpenSource suggestions? (I'm actually
migrating from sleepycat bsddb at the moment because I didn't realize
the licensing costs involved there. My needs really aren't that
extensive really - a few associative indexes, cascading delete, etc.)

My large data sets will have the following characteristics:
5,000,000 Rows x 50 bytes/row (could be fixed) w/1 multi-column index,
1 single-column timestamp index AT 1,000,000 index searches, 300,000
reads, 150,000 updates per day
720,000 Rows x 32 bytes/row fixed w/ 1 multi-column index, 1
single-column timestamp index AT 150,000 index search, 150,000 updates,
100s of reads per day
2,000,000 Rows x 4000 bytes avg/row AT 100,000 inserts per day, 150,000
reads per day

Thanks,
Eric

Eric Brown
408-571-6341
www.propel.com

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Martijn van Oosterhout 2004-12-28 11:59:51 Re: Long-running performance (MVCC, Vacuum, etc.) - Any fix?
Previous Message Marek Lewczuk 2004-12-27 19:54:49 Re: Get current trasanction id