Re: [HACKERS] Autovacuum Improvements

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, Russell Smith <mr-russ(at)pws(dot)com(dot)au>, Darcy Buskermolen <darcyb(at)commandprompt(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, "Matthew T(dot) O'Connor" <matthew(at)zeut(dot)net>, Pavan Deolasee <pavan(at)enterprisedb(dot)com>, Christopher Browne <cbbrowne(at)acm(dot)org>, pgsql-general(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Autovacuum Improvements
Date: 2007-01-22 18:27:19
Message-ID: 200701221827.l0MIRJp00294@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers


Yep, agreed on the random I/O issue. The larger question is if you have
a huge table, do you care to reclaim 3% of the table size, rather than
just vacuum it when it gets to 10% dirty? I realize the vacuum is going
to take a lot of time, but vacuuming to relaim 3% three times seems like
it is going to be more expensive than just vacuuming the 10% once. And
vacuuming to reclaim 1% ten times seems even more expensive. The
partial vacuum idea is starting to look like a loser to me again.

---------------------------------------------------------------------------

Gregory Stark wrote:
> "Bruce Momjian" <bruce(at)momjian(dot)us> writes:
>
> > I agree it index cleanup isn't > 50% of vacuum. I was trying to figure
> > out how small, and it seems about 15% of the total table, which means if
> > we have bitmap vacuum, we can conceivably reduce vacuum load by perhaps
> > 80%, assuming 5% of the table is scanned.
>
> Actually no. A while back I did experiments to see how fast reading a file
> sequentially was compared to reading the same file sequentially but skipping
> x% of the blocks randomly. The results were surprising (to me) and depressing.
> The breakeven point was about 7%.
>
> That is, if you assum that only 5% of the table will be scanned and you
> arrange to do it sequentially then you should expect the i/o to be marginally
> faster than just reading the entire table. Vacuum does do some cpu work and
> wouldn't have to consult the clog as often, so it would still be somewhat
> faster.
>
> The theory online was that as long as you're reading one page from each disk
> track you're going to pay the same seek overhead as reading the entire track.
> I also had some theories involving linux being confused by the seeks and
> turning off read-ahead but I could never prove them.
>
> In short, to see big benefits you would have to have a much smaller percentage
> of the table being read. That shouldn't be taken to mean that the DSM is a
> loser. There are plenty of use cases where tables can be extremely large and
> have only very small percentages that are busy. The big advantage of the DSM
> is that it takes the size of the table out of the equation and replaces it
> with the size of the busy portion of the table. So updating a single record in
> a terabyte table has the same costs as updating a single record in a kilobyte
> table.
>
> Sadly that's not quite true due to indexes, and due to the size of the bitmap
> itself. But going back to your numbers it does mean that if you update a
> single row out of a terabyte table then we'll be removing about 85% of the i/o
> (minus the i/o needed to read the DSM, about .025%). If you update about 1%
> then you would be removing substantially less, and once you get to about 10%
> then you're back where you started.
>
>
> --
> Gregory Stark
> EnterpriseDB http://www.enterprisedb.com
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faq

--
Bruce Momjian bruce(at)momjian(dot)us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Gregory Stark 2007-01-22 18:41:22 Re: [HACKERS] Autovacuum Improvements
Previous Message Alexander Presber 2007-01-22 17:59:36 printf-like format strings

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2007-01-22 18:33:14 Re: [HACKERS] Win32 WEXITSTATUS too
Previous Message Gregory Stark 2007-01-22 17:51:53 Re: [HACKERS] Autovacuum Improvements