Re: PostgreSQL clustering VS MySQL clustering

From: Marty Scholes <marty(at)outputservices(dot)com>
To: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
Cc: herve(at)elma(dot)fr, pgsql-performance(at)postgresql(dot)org
Subject: Re: PostgreSQL clustering VS MySQL clustering
Date: 2005-01-24 15:45:57
Message-ID: 41F51835.3040202@outputservices.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Tatsuo,

I agree completely that vacuum falls apart on huge tables. We could
probably do the math and figure out what the ratio of updated rows per
total rows is each day, but on a constantly growing table, that ratio
gets smaller and smaller, making the impact of dead tuples in the table
proportionately less and less.

If multi-version indexes are handled the same way as table rows, then
the indexes will also suffer the same fate, if not worse. For huge
tables, the b-tree depth can get fairly large. When a b-tree is of
depth X and the machine holds the first Y levels of the b-tree in
memory, then each table row selected requires a MINIMUM of (X-Y) disk
access *before* the table row is accessed. Substitute any numbers you
want for X and Y, but you will find that huge tables require many index
reads.

Index updates are even worse. A table row update requires only a copy
of the row. An index update requires at least a copy of the leaf node,
and possibly more nodes if nodes must be split or collapsed. These
splits and collapses can cascade, causing many nodes to be affected.

This whole process takes place for each and every index affected by the
change, which is every index on the table when a row is added or
deleted. All of this monkeying around takes place above and beyond the
simple change of the row data. Further, each and every affected index
page is dumped to WAL.

Assuming the indexes have the same MVCC proprties of row data, then the
indexes would get dead tuples at a rate far higher than that of the
table data.

So yes, vacuuming is a problem on large tables. It is a bigger problem
for indexes. On large tables, index I/O comprises most of the I/O mix.

Don't take my word for it. Run a benchmark on Pg. Then, soft-link the
index files and the WAL directories to a RAM disk. Rerun the benchmark
and you will find that Pg far faster, much faster than if only the data
were on the RAM disk.

Marty

Tatsuo Ishii wrote:
> IMO the bottle neck is not WAL but table/index bloat. Lots of updates
> on large tables will produce lots of dead tuples. Problem is, There'
> is no effective way to reuse these dead tuples since VACUUM on huge
> tables takes longer time. 8.0 adds new vacuum delay
> paramters. Unfortunately this does not help. It just make the
> execution time of VACUUM longer, that means more and more dead tuples
> are being made while updating.
>
> Probably VACUUM works well for small to medium size tables, but not
> for huge ones. I'm considering about to implement "on the spot
> salvaging dead tuples".
> --
> Tatsuo Ishii
>
>
>>This is probably a lot easier than you would think. You say that your
>>DB will have lots of data, lots of updates and lots of reads.
>>
>>Very likely the disk bottleneck is mostly index reads and writes, with
>>some critical WAL fsync() calls. In the grand scheme of things, the
>>actual data is likely not accessed very often.
>>
>>The indexes can be put on a RAM disk tablespace and that's the end of
>>index problems -- just make sure you have enough memory available. Also
>>make sure that the machine can restart correctly after a crash: the
>>tablespace is dropped and recreated, along with the indexes. This will
>>cause a machine restart to take some time.
>>
>>After that, if the WAL fsync() calls are becoming a problem, put the WAL
>>files on a fast RAID array, etiher a card or external enclosure, that
>>has a good amount of battery-backed write cache. This way, the WAL
>>fsync() calls will flush quickly to the RAM and Pg can move on while the
>>RAID controller worries about putting the data to disk. With WAL, low
>>access time is usually more important than total throughput.
>>
>>The truth is that you could have this running for not much money.
>>
>>Good Luck,
>>Marty
>>
>>
>>>Le Jeudi 20 Janvier 2005 19:09, Bruno Almeida do Lago a écrit :
>>> > Could you explain us what do you have in mind for that solution? I mean,
>>> > forget the PostgreSQL (or any other database) restrictions and
>>>explain us
>>> > how this hardware would be. Where the data would be stored?
>>> >
>>> > I've something in mind for you, but first I need to understand your
>>>needs!
>>>
>>>I just want to make a big database as explained in my first mail ... At the
>>>beginning we will have aprox. 150 000 000 records ... each month we will
>>>add
>>>about 4/8 millions new rows in constant flow during the day ... and in same
>>>time web users will access to the database in order to read those data.
>>>Stored data are quite close to data stored by google ... (we are not
>>>making a
>>>google clone ... just a lot of data many small values and some big ones ...
>>>that's why I'm comparing with google for data storage).
>>>Then we will have a search engine searching into those data ...
>>>
>>>Dealing about the hardware, for the moment we have only a bi-pentium Xeon
>>>2.8Ghz with 4 Gb of RAM ... and we saw we had bad performance results
>>>... so
>>>we are thinking about a new solution with maybe several servers (server
>>>design may vary from one to other) ... to get a kind of cluster to get
>>>better
>>>performance ...
>>>
>>>Am I clear ?
>>>
>>>Regards,
>>
>>
>>
>>
>>---------------------------(end of broadcast)---------------------------
>>TIP 2: you can get off all lists at once with the unregister command
>> (send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)
>>
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Josh Berkus 2005-01-24 17:52:40 Re: PgPool changes WAS: PostgreSQL clustering VS MySQL clustering
Previous Message Merlin Moncure 2005-01-24 15:30:22 Re: [SQL] OFFSET impact on Performance???