Re: Performance query about large tables, lots of concurrent access

From: Karl Wright <kwright(at)metacarta(dot)com>
To: Scott Marlowe <smarlowe(at)g2switchworks(dot)com>
Cc: Shaun Thomas <sthomas(at)leapfrogonline(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Performance query about large tables, lots of concurrent access
Date: 2007-06-21 16:29:49
Message-ID: 467AA77D.3030204@metacarta.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Scott Marlowe wrote:
> Karl Wright wrote:
>> Scott Marlowe wrote:
>>> Karl Wright wrote:
>>>
>>>> Shaun Thomas wrote:
>>>>
>>>>> On Wednesday 20 June 2007 12:55:20 pm Karl Wright wrote:
>>>>>
>>>>>
>>>>>> I am afraid that I did answer this. My largest tables
>>>>>> are the ones continually being updated. The smaller
>>>>>> ones are updated only infrequently.
>>>>>
>>>>>
>>>>>
>>>>> You know, it actually sounds like you're getting whacked by the
>>>>> same problem that got us a while back. It sounds like you weren't
>>>>> vacuuming frequently enough initially, and then tried vacuuming
>>>>> later, only after you noticed performance degrade.
>>>>>
>>>>> Unfortunately what that means, is for several weeks or months,
>>>>> Postgres has not been reusing rows on your (admittedly) active and
>>>>> large tables; it just appends at the end, and lets old rows slowly
>>>>> bloat that table larger and larger. Indexes too, will suffer from
>>>>> dead pages. As frightening/sickening as this sounds, you may need
>>>>> to dump/restore the really huge table, or vacuum-full to put it on
>>>>> a crash diet, and then maintain a strict daily or bi-daily vacuum
>>>>> schedule to keep it under control.
>>>>>
>>>>
>>>> A nice try, but I had just completed a VACUUM on this database three
>>>> hours prior to starting the VACUUM that I gave up on after 27
>>>> hours. So I don't see how much more frequently I could do it. (The
>>>> one I did earlier finished in six hours - but to accomplish that I
>>>> had to shut down EVERYTHING else that machine was doing.)
>>>
>>>
>>> So, have you ever run vacuum full or reindex on this database?
>>>
>>
>> No. However, this database has only existed since last Thursday
>> afternoon.
> Well, a couple of dozen update statements with no where clause on large
> tables could bloat it right up.
>
> It's not about age so much as update / delete patterns.
>>
>>> You are aware of the difference between how vacuum and vacuum full
>>> work, right?
>>>
>>> vacuum := mark deleted tuples as available, leave in table
>>> vacuum full := compact tables to remove deleted tuples.
>>>
>>> While you should generally avoid vacuum full, if you've let your
>>> database get so bloated that the majority of space in your tables is
>>> now empty / deleted tuples, you likely need to vacuuum full / reindex
>>> it.
>>>
>> If the database is continually growing, should VACUUM FULL be necessary?
> If it's only growing, with no deletes or updates, then no. Generally,
> on a properly vacuumed database, vacuum full should never be needed.
>>> For instance, on my tiny little 31 Gigabyte reporting database, the
>>> main table takes up about 17 Gigs. This query gives you some idea
>>> how many bytes each row is taking on average:
>>>
>>> select relname, relpages::float*8192 as size, reltuples,
>>> (relpages::double precision*8192)/reltuples::double precision as
>>> bytes_per_row from pg_class where relname = 'businessrequestsummary';
>>> relname | size | reltuples | bytes_per_row
>>> ------------------------+-------------+-------------+-----------------
>>> businessrequestsummary | 17560944640 | 5.49438e+07 | 319.61656229454
>>>
>>> Note that these numbers are updated by running analyze...
>>>
>>> What does it say about your DB?
>>>
>>
>> I wish I could tell you. Like I said, I had to abandon this project
>> to test out an upgrade procedure involving pg_dump and pg_restore.
>> (The upgrade also seems to take a very long time - over 6 hours so
>> far.) When it is back online I can provide further information.
>
> Well, let us know. I would definitely recommend getting more / faster
> disks. Right now I've got a simple 4 disk RAID10 on the way to replace
> the single SATA drive I'm running on right now. I can't wait.
>

I checked the disk picture - this is a RAID disk array with 6 drives,
with a bit more than 1Tbyte total storage. 15,000 RPM. It would be
hard to get more/faster disk than that.

Karl

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Campbell, Lance 2007-06-21 16:32:22 Re: Volunteer to build a configuration tool
Previous Message Andrew Sullivan 2007-06-21 15:59:14 Re: Volunteer to build a configuration tool