Re: REINDEX takes half a day (and still not complete!)

From: Phoenix Kiula <phoenix(dot)kiula(at)gmail(dot)com>
To: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: REINDEX takes half a day (and still not complete!)
Date: 2011-03-20 08:04:06
Message-ID: AANLkTinO2zfueMQfL4x4ue8VS6YD=dU3T=qWnp9xGzLf@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Thanks Scott.

> What is the output of running vacuum verbose as a superuser (you can
> run it on the postgres database so it returns fast.)

Here's the output for postgres DB:

INFO: free space map contains 110614 pages in 33 relations
DETAIL: A total of 110464 page slots are in use (including overhead).
110464 page slots are required to track all free space.
Current limits are: 950000 page slots, 1500 relations, using 5665 kB.
VACUUM

Does running it on a postgres database also show the relevant info for
other databases?

From above it seems fine, right?

> also, if vacuum can't keep up you can increase the vacuum cost limit,
> and lower the cost delay.  Anything above 1ms is still quite a wait
> compared to 0.  And most systems don't have the real granularity to go
> that low anyway, so 5ms is about as low as you can go and get a change
> before 0.  Also, if you've got a lot of large relations you might need
> to increase the max workers as well.

I'm not sure I understand this.

(1) I should increase "max workers". But I am on version 8.2.9 -- did
this version have "autovacuum_max_workers"? It seems to be a more
recent thing: http://sn.im/27nxe1

(2) The big table in my database (with 125 million rows) has about
5,000 rows that get DELETEd every day, about 100,000 new INSERTs, and
about 12,000 UPDATEs.

(3) What's that thing about cost delay. Which values from vacuum
should I check to determine the cost delay -- what's the specific
formula?

Thanks!

On Sat, Mar 19, 2011 at 12:58 PM, Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com> wrote:
> On Fri, Mar 18, 2011 at 9:07 PM, Phoenix Kiula <phoenix(dot)kiula(at)gmail(dot)com> wrote:
>> I have a large table but not as large as the kind of numbers that get
>> discussed on this list. It has 125 million rows.
>>
>> REINDEXing the table takes half a day, and it's still not finished.
>>
>> To write this post I did "SELECT COUNT(*)", and here's the output -- so long!
>>
>>    select count(*) from links;
>>       count
>>    -----------
>>     125418191
>>    (1 row)
>>
>>    Time: 1270405.373 ms
>>
>> That's 1270 seconds!
>>
>> I suppose the vaccuum analyze is not doing its job? As you can see
>> from settings below, I have autovacuum set to ON, and there's also a
>> cronjob every 10 hours to do a manual vacuum analyze on this table,
>> which is largest.
>>
>> PG is version 8.2.9.
>>
>> Any thoughts on what I can do to improve performance!?
>>
>> Below are my settings.
>>
>>
>>
>> max_connections              = 300
>> shared_buffers               = 500MB
>> effective_cache_size         = 1GB
>> max_fsm_relations            = 1500
>> max_fsm_pages                = 950000
>>
>> work_mem                     = 100MB
>
> What is the output of running vacuum verbose as a superuser (you can
> run it on the postgres database so it returns fast.)  We're looking
> for the output that looks like this:
>
> INFO:  free space map contains 1930193 pages in 749 relations
> DETAIL:  A total of 1787744 page slots are in use (including overhead).
> 1787744 page slots are required to track all free space.
> Current limits are:  10000000 page slots, 3000 relations, using 58911 kB.
>
> If the space needed exceeds page slots then you need to crank up your
> free space map.  If the relations exceeds the available then you'll
> need to crank up max relations.
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Adam Tistler 2011-03-21 03:20:56 Re: Select in subselect vs select = any array
Previous Message Pavel Stehule 2011-03-20 06:51:20 Re: Select in subselect vs select = any array