Skip site navigation (1) Skip section navigation (2)

Re: VACUUM FREEZE output more than double input

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: VACUUM FREEZE output more than double input
Date: 2007-12-14 23:59:51
Message-ID: 15401.1197676791@sss.pgh.pa.us (view raw or flat)
Thread:
Lists: pgsql-performance
"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
> Why double writes per read, plus massive writes at checkpoint?

The double writes aren't surprising: freezing has to be WAL-logged,
and the odds are that each page hasn't been touched since the last
checkpoint, so the WAL log will include a complete page image.
So in the steady state where all shared buffers are dirty, the
per-page cycle is:
	* write out a dirty buffer so it can be reclaimed
	* read in a page
	* modify it to mark tuples frozen
	* write an image of the page to WAL
	* leave the now-dirty page in shared buffers for later writing

The checkpoint spikes would come from trying to flush out all the
dirty buffers at once.

You'd expect a bit of a valley after each peak, since the vacuum
could presumably recycle some buffers without having to flush 'em
first; but I don't see one in your data.  That may just be because
the numbers are too noisy, but I kinda suspect that the vacuum is
dirtying buffers nearly as fast as the bgwriter can clean them,
leaving not a lot of daylight for a valley.

8.3 should pretty well eliminate the checkpoint spike in this scenario,
because vacuum will work in a limited number of shared buffers instead
of dirtying the whole cache.  But you'll still see 2X writes over reads.

If this is data that you could re-generate at need, it might make sense
to turn off full_page_writes during the initial data load and vacuum.

I concur with trying to FREEZE all the data while you do this, else
you'll see the same work done whenever the data happens to slip past
the auto freeze threshold.

			regards, tom lane

In response to

Responses

pgsql-performance by date

Next:From: Tom LaneDate: 2007-12-15 00:04:50
Subject: Re: explanation for seeks in VACUUM
Previous:From: Alvaro HerreraDate: 2007-12-14 22:03:15
Subject: Re: viewing source code

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group