| From: | Eric Brown <eric(dot)brown(at)propel(dot)com> | 
|---|---|
| To: | pgsql-general(at)postgresql(dot)org | 
| Subject: | Long-running performance (MVCC, Vacuum, etc.) - Any fix? | 
| Date: | 2004-12-28 10:40:52 | 
| Message-ID: | F26F359C-58BC-11D9-9956-000A95C7176C@propel.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-general | 
I'm building an appliance where I don't want my customers having to 
tune postgresql in any way from the configuration when I install it. I 
don't even want them to know it is there. Yet in my study of posgresql, 
it seems that even conscientious use of a vacuum daemon or cron job in 
vacuum's various forms leaves databases after a while much larger and 
lower performing than the actual space and performance after a 
dump/restore.
It seems to me that with MVCC, an UPDATE (for example) is really a 
DELETE+INSERT and it takes VACUUM to mark the old DELETED row as free 
space again. So changing one boolean or increasing one counter in a row 
of 100 bytes will relocate this row in a different location. While this 
in itself isn't terrible as some DB page is dirty no matter what, 
unless I'm lucky enough to insert into the same page that changed, my 
index(es) for this table will have to be changed and this will dirty 
yet another page that wouldn't be dirty without MVCC. And this is not 
to mention the resources required by VACUUM. And the disk bloat over 
time would imply that postgresql doesn't always efficiently use 
free-space with records/rows of similar size and so performance 
degrades as there are fewer rows per page over time.
My application is multi-threaded and I could care less about the 
concurrency afforded by MVCC. I'd rather just update the things in 
place and get less postgresql concurrency but more consistent 
long-running performance and disk space utilization.
Is my interpretation correct?
Is there a way to turn off MVCC?
Do fixed sized rows help any?
Is there anybody using this thing in an appliance type application?
I'm quite far along with stored-procedures and whatnot, but if 
postgresql really isn't the right solution due to these reasons, I'm 
curious if anybody has alternate OpenSource suggestions? (I'm actually 
migrating from sleepycat bsddb at the moment because I didn't realize 
the licensing costs involved there. My needs really aren't that 
extensive really - a few associative indexes, cascading delete, etc.)
My large data sets will have the following characteristics:
5,000,000 Rows x 50 bytes/row (could be fixed) w/1 multi-column index, 
1 single-column timestamp index AT 1,000,000 index searches, 300,000 
reads, 150,000 updates per day
720,000 Rows x 32 bytes/row fixed w/ 1 multi-column index, 1 
single-column timestamp index AT 150,000 index search, 150,000 updates, 
100s of reads per day
2,000,000 Rows x 4000 bytes avg/row AT 100,000 inserts per day, 150,000 
reads per day
Thanks,
Eric
Eric Brown
408-571-6341
www.propel.com
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Martijn van Oosterhout | 2004-12-28 11:59:51 | Re: Long-running performance (MVCC, Vacuum, etc.) - Any fix? | 
| Previous Message | Marek Lewczuk | 2004-12-27 19:54:49 | Re: Get current trasanction id |