Skip site navigation (1) Skip section navigation (2)

Re: [HACKERS] More detail on settings for pgavd?

From: "Matthew T(dot) O'Connor" <matthew(at)zeut(dot)net>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Shridhar Daithankar <shridhar_daithankar(at)myrealbox(dot)com>,pgsql-performance(at)postgresql(dot)org,PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] More detail on settings for pgavd?
Date: 2003-11-21 14:56:17
Message-ID: 3FBE2791.8040404@zeut.net (view raw or flat)
Thread:
Lists: pgsql-hackerspgsql-performance
Josh Berkus wrote:

>Matthew,
>  
>
>>True, but I think it would be one hour once, rather than 30 minutes 4
>>times.
>>    
>>
>Well, generally it would be about 6-8 times at 2-4 minutes each.
>  
>
Are you saying that you can vacuum a 1 million row table in 2-4 
minutes?  While a vacuum of the same table with an additional 1 million 
dead tuples would take an hour?

>>This is one of the things I had hoped to add to pg_autovacuum, but never
>>got to.  In addition to just the information from the stats collector on
>>inserts updates and deletes, pg_autovacuum should also look at the FSM,
>>and make decisions based on it.  Anyone looking for a project?
>>    
>>
>Hmmm ... I think that's the wrong approach.  Once your database is populated, 
>it's very easy to determine how to set the FSM for a given pg_avd level.   If 
>you're vacuuming after 20% updates, for example, just set fsm_pages to 20% of 
>the total database pages plus growth & safety margins.
>  
>
Ok.

>I'd be really reluctant to base pv-avd frequency on the fsm settings instead.  
>What if the user loads 8GB of data but leaves fsm_pages at the default of 
>10,000?  You can't do much with that; you'd have to vacuum if even 1% of the 
>data changed.
>
Ok, but as you said above it's very easy to set the FSM once you know 
your db size.

>The other problem is that calculating data pages from a count of 
>updates+deletes would require pg_avd to keep more statistics and do more math 
>for every table.  Do we want to do this?
>  
>
I would think the math is simple enough to not be a big problem.  Also, 
I did not recommend looking blindly at the FSM as our guide, rather 
consulting it as another source of information as to when it would be 
useful to vacuum.  I don't have a good plan as to how to incorporate 
this data, but to a large extent the FSM already tracks table activity 
and gives us the most accurate answer about storage growth (short of  
using something like contrib/pgstattuple which takes nearly the same 
amount of time as an actual vacuum)

>>But I can't imagine that 2% makes any difference on a large table.  In
>>fact I would think that 10-15% would hardly be noticable, beyond that
>>I'm not sure.
>>    
>>
>I've seen performance lag at 10% of records, especially in tables where both 
>update and select activity focus on one subset of the table (calendar tables, 
>for example).
>  
>
Ok.

>>Valid points, and again I think this points to the fact that
>>pg_autovacuum needs to be more configurable.  Being able to set
>>different thresholds for different tables will help considerably.  In
>>fact, you may find that some tables should have a vac threshold much
>>larger than the analyze thresold, while other tables might want the
>>opposite.
>>    
>>
>Sure.  Though I think we can make the present configuration work with a little 
>adjustment of the numbers.   I'll have a chance to test on production 
>databases soon.
>  
>
I look forward to hearing results from your testing.

>>I would be surprized if you can notice the difference between a vacuum
>>analyze and a vacuum, especially on large tables.
>>    
>>
>It's substantial for tables with high statistics settings.   A 1,000,000 row 
>table with 5 columns set to statistics=250 can take 3 minutes to analyze on a 
>medium-grade server.
>  
>
In my testing, I never changed the default statistics settings.

>>I think you need two separate schedules.  There are lots of times where
>>a vacuum doesn't help, and an analyze is all that is needed
>>    
>>
>Agreed.  And I've just talked to a client who may want to use pg_avd's ANALYZE 
>scheduling but not use vacuum at all.   BTW, I think we should have a setting 
>for this; for example, if -V is -1, don't vacuum.
>  
>
That would be nice.  Easy to add, and something I never thought of....

>>I'm open to discussion on changing the defaults.  Perhaps what it would
>>be better to use some non-linear (perhaps logorithmic) scaling factor.
>>    
>>
>That would be cool, too.    Though a count of data pages would be a better 
>scale than a count of rows, and equally obtainable from pg_class.
>  
>
But we track tuples because we can compare against the count given by 
the stats system.  I don't know of a way (other than looking at the FSM, 
or contrib/pgstattuple ) to see how many dead pages exist.



In response to

Responses

pgsql-performance by date

Next:From: Shridhar DaithankarDate: 2003-11-21 15:06:54
Subject: Re: [HACKERS] More detail on settings for pgavd?
Previous:From: Tom LaneDate: 2003-11-21 14:54:32
Subject: Re: duration logging setting in 7.4

pgsql-hackers by date

Next:From: Shridhar DaithankarDate: 2003-11-21 15:06:54
Subject: Re: [HACKERS] More detail on settings for pgavd?
Previous:From: Matthew T. O'ConnorDate: 2003-11-21 14:31:49
Subject: Re: [HACKERS] More detail on settings for pgavd?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group