Re: Performance query about large tables, lots of concurrent access

From: Scott Marlowe <smarlowe(at)g2switchworks(dot)com>
To: Karl Wright <kwright(at)metacarta(dot)com>
Cc: Shaun Thomas <sthomas(at)leapfrogonline(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Performance query about large tables, lots of concurrent access
Date: 2007-06-21 15:44:37
Message-ID: 467A9CE5.508@g2switchworks.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Karl Wright wrote:
> Scott Marlowe wrote:
>> Karl Wright wrote:
>>
>>> Shaun Thomas wrote:
>>>
>>>> On Wednesday 20 June 2007 12:55:20 pm Karl Wright wrote:
>>>>
>>>>
>>>>> I am afraid that I did answer this. My largest tables
>>>>> are the ones continually being updated. The smaller
>>>>> ones are updated only infrequently.
>>>>
>>>>
>>>>
>>>> You know, it actually sounds like you're getting whacked by the
>>>> same problem that got us a while back. It sounds like you weren't
>>>> vacuuming frequently enough initially, and then tried vacuuming
>>>> later, only after you noticed performance degrade.
>>>>
>>>> Unfortunately what that means, is for several weeks or months,
>>>> Postgres has not been reusing rows on your (admittedly) active and
>>>> large tables; it just appends at the end, and lets old rows slowly
>>>> bloat that table larger and larger. Indexes too, will suffer from
>>>> dead pages. As frightening/sickening as this sounds, you may need
>>>> to dump/restore the really huge table, or vacuum-full to put it on
>>>> a crash diet, and then maintain a strict daily or bi-daily vacuum
>>>> schedule to keep it under control.
>>>>
>>>
>>> A nice try, but I had just completed a VACUUM on this database three
>>> hours prior to starting the VACUUM that I gave up on after 27
>>> hours. So I don't see how much more frequently I could do it. (The
>>> one I did earlier finished in six hours - but to accomplish that I
>>> had to shut down EVERYTHING else that machine was doing.)
>>
>>
>> So, have you ever run vacuum full or reindex on this database?
>>
>
> No. However, this database has only existed since last Thursday
> afternoon.
Well, a couple of dozen update statements with no where clause on large
tables could bloat it right up.

It's not about age so much as update / delete patterns.
>
>> You are aware of the difference between how vacuum and vacuum full
>> work, right?
>>
>> vacuum := mark deleted tuples as available, leave in table
>> vacuum full := compact tables to remove deleted tuples.
>>
>> While you should generally avoid vacuum full, if you've let your
>> database get so bloated that the majority of space in your tables is
>> now empty / deleted tuples, you likely need to vacuuum full / reindex
>> it.
>>
> If the database is continually growing, should VACUUM FULL be necessary?
If it's only growing, with no deletes or updates, then no. Generally,
on a properly vacuumed database, vacuum full should never be needed.
>> For instance, on my tiny little 31 Gigabyte reporting database, the
>> main table takes up about 17 Gigs. This query gives you some idea
>> how many bytes each row is taking on average:
>>
>> select relname, relpages::float*8192 as size, reltuples,
>> (relpages::double precision*8192)/reltuples::double precision as
>> bytes_per_row from pg_class where relname = 'businessrequestsummary';
>> relname | size | reltuples | bytes_per_row
>> ------------------------+-------------+-------------+-----------------
>> businessrequestsummary | 17560944640 | 5.49438e+07 | 319.61656229454
>>
>> Note that these numbers are updated by running analyze...
>>
>> What does it say about your DB?
>>
>
> I wish I could tell you. Like I said, I had to abandon this project
> to test out an upgrade procedure involving pg_dump and pg_restore.
> (The upgrade also seems to take a very long time - over 6 hours so
> far.) When it is back online I can provide further information.

Well, let us know. I would definitely recommend getting more / faster
disks. Right now I've got a simple 4 disk RAID10 on the way to replace
the single SATA drive I'm running on right now. I can't wait.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Andrew Sullivan 2007-06-21 15:59:14 Re: Volunteer to build a configuration tool
Previous Message Tom Lane 2007-06-21 15:32:49 Re: Data transfer very slow when connected via DSL