| From: | Steve Atkins <steve(at)blighty(dot)com> | 
|---|---|
| To: | pgsql-hackers(at)postgresql(dot)org, pgsql general <pgsql-general(at)postgresql(dot)org> | 
| Subject: | Re: [HACKERS] Autovacuum Improvements | 
| Date: | 2007-01-22 23:39:10 | 
| Message-ID: | 46EC7F08-54DE-49AD-A546-CB2824E8092D@blighty.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-general pgsql-hackers | 
On Jan 22, 2007, at 11:16 AM, Richard Huxton wrote:
> Bruce Momjian wrote:
>> Yep, agreed on the random I/O issue.  The larger question is if  
>> you have
>> a huge table, do you care to reclaim 3% of the table size, rather  
>> than
>> just vacuum it when it gets to 10% dirty?  I realize the vacuum is  
>> going
>> to take a lot of time, but vacuuming to relaim 3% three times  
>> seems like
>> it is going to be more expensive than just vacuuming the 10%  
>> once.  And
>> vacuuming to reclaim 1% ten times seems even more expensive.  The
>> partial vacuum idea is starting to look like a loser to me again.
>
> Buying a house with a 25-year mortgage is much more expensive than  
> just paying cash too, but you don't always have a choice.
>
> Surely the key benefit of the partial vacuuming thing is that you  
> can at least do something useful with a large table if a full  
> vacuum takes 24 hours and you only have 4 hours of idle I/O.
>
> It's also occurred to me that all the discussion of scheduling way  
> back when isn't directly addressing the issue. What most people  
> want (I'm guessing) is to vacuum *when the user-workload allows*  
> and the time-tabling is just a sysadmin first-approximation at that.
Yup. I'd really like for my app to be able to say "Hmm. No  
interactive users at the moment, no critical background tasks. Now  
would be a really good time for the DB to do some maintenance." but  
also to be able to interrupt the maintenance process if some new  
users or other system load show up.
> With partial vacuuming possible, we can arrange things with just  
> three thresholds and two measurements:
>   Measurement 1 = system workload
>   Measurement 2 = a per-table "requires vacuuming" value
>   Threshold 1 = workload at which we do more vacuuming
>   Threshold 2 = workload at which we do less vacuuming
>   Threshold 3 = point at which a table is considered worth vacuuming.
> Once every 10 seconds, the manager compares the current workload to  
> the thresholds and starts a new vacuum, kills one or does nothing.  
> New vacuum processes keep getting started as long as there is  
> workload spare and tables that need vacuuming.
>
> Now the trick of course is how you measure system workload in a  
> meaningful manner.
I'd settle for a "start maintenance", "stop maintenance" API.  
Anything else (for instance the heuristics you suggest above) would  
definitely be gravy.
It's not going to be simple to do, though, I don't think.
Cheers,
   Steve
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Paul Lambert | 2007-01-22 23:51:44 | Re: Installing Postegres side-by-side with M$ SQL server]] | 
| Previous Message | George Weaver | 2007-01-22 23:32:13 | Problems compiling from source | 
| From | Date | Subject | |
|---|---|---|---|
| Next Message | elein | 2007-01-23 00:05:49 | Re: -f <output file> option for pg_dumpall | 
| Previous Message | Jim C. Nasby | 2007-01-22 23:30:21 | Re: autovacuum process handling |