Re: VACUUM FREEZE output more than double input

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: VACUUM FREEZE output more than double input
Date: 2007-12-17 15:45:43
Message-ID: 47664546.EE98.0025.0@wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

>>> On Fri, Dec 14, 2007 at 5:59 PM, in message <15401(dot)1197676791(at)sss(dot)pgh(dot)pa(dot)us>,
Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
>> Why double writes per read, plus massive writes at checkpoint?
>
> The double writes aren't surprising: freezing has to be WAL-logged,
> and the odds are that each page hasn't been touched since the last
> checkpoint, so the WAL log will include a complete page image.
> So in the steady state where all shared buffers are dirty, the
> per-page cycle is:
> * write out a dirty buffer so it can be reclaimed
> * read in a page
> * modify it to mark tuples frozen
> * write an image of the page to WAL
> * leave the now-dirty page in shared buffers for later writing
>
> The checkpoint spikes would come from trying to flush out all the
> dirty buffers at once.

Got it. Thanks.

> You'd expect a bit of a valley after each peak, since the vacuum
> could presumably recycle some buffers without having to flush 'em
> first; but I don't see one in your data. That may just be because
> the numbers are too noisy, but I kinda suspect that the vacuum is
> dirtying buffers nearly as fast as the bgwriter can clean them,
> leaving not a lot of daylight for a valley.

Yeah, the pattern was pretty consistent and without valleys.

> 8.3 should pretty well eliminate the checkpoint spike in this scenario,
> because vacuum will work in a limited number of shared buffers instead
> of dirtying the whole cache. But you'll still see 2X writes over reads.

Testing 8.3beta4 so far has shown both smoother I/O and better
performance in all respects. The preliminary post I did where I
thought I saw some regression on loading a pg_dump turned out to be
was an "apples to oranges" comparison; comparing the same load on
the same hardware and OS, 8.3 wins. (Kudos to all who worked on
these improvements!)

> If this is data that you could re-generate at need, it might make sense
> to turn off full_page_writes during the initial data load and vacuum.

Thanks for the suggestions; I'll try that.

> I concur with trying to FREEZE all the data while you do this, else
> you'll see the same work done whenever the data happens to slip past
> the auto freeze threshold.

Thanks. I thought that made sense, but I'm still trying to get my
head around some of the dynamics of PostgreSQL and MVCC. I'll
suggest that as policy here.

-Kevin

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Joshua D. Drake 2007-12-17 16:13:43 Re: viewing source code
Previous Message Merlin Moncure 2007-12-17 14:13:52 Re: viewing source code